Document assignment in multi-site search engines

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
OriginalspracheEnglisch
TitelProceedings of the fourth ACM international conference on Web search and data mining
Anzahl der Seiten10
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2011
Seiten575-584
ISBN (Print)978-1-4503-0493-1
DOIs
PublikationsstatusErschienen - 2011
Extern publiziertJa
Veranstaltung4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Dauer: 09.02.201112.02.2011
Konferenznummer: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

DOI

Zuletzt angesehen

Forschende

  1. Inge Nehring

Aktivitäten

  1. Current Developments in Environmental Management Accounting: Towards a Comprehensive Framework for Environmental Management Accounting
  2. One generation plants the trees, another gets the shade? Negotiators' perceptions and behaviors in intergenerational allocations of resources.
  3. Time and Space of Technopolitics
  4. “Corruption and Trust: A Model Design”
  5. The Advance of Diagnosis Chatbots: Should We First Avoid Distrust Before We Focus on Trust?
  6. Sprach-Los - Grenzen-Los
  7. 3rd International Conference on Innovations in Bio-Inspired Computing and Applications: Program Committee Member - IBICA2012
  8. Ecology & Evolution, Lecture series 2012
  9. Guest lecture Carbon performance and disclosure: Governance-related determinants and their firms’ financial consequences
  10. Guest lecture: Carbon performance and disclosure: Governance-related determinants and their financial consequences
  11. „Zum Zeitvertreib“
  12. The temporal dynamics of ambidextrous leadership for innovation: A diary study
  13. Workshop "From Models to Monsters. Representing the World Economy and its Discontents"
  14. Reading expository texts at school - how text cohesion can support students’ reading comprehension
  15. 5. Forschungsseminar „Innovation and Value Creation“ 2010
  16. Guest lecture: Carbon performance and disclosure: Governance-related determinants and their financial consequences
  17. How harmonious and obsessive passion lead to entrepreneurial success: Unfolding the underlying process on a state level.
  18. Brain Drain. John C. Lilly's Floating Tanks and the Technologization of Wellness
  19. European Federation of Associations and Centres of Irish Studies Roundtable Discussion
  20. Tech and TEFL - Learning and Teaching English in the Digital Age
  21. Titel: Worker voice on digital platforms and beyond.
  22. Journal of the Writing Research (Zeitschrift)

Publikationen

  1. Effect of silicon content on hot working, processing maps, and microstructural evolution of cast TX32-0.4Al magnesium alloy
  2. Informatik
  3. Performance of methods to select landscape metrics for modelling species richness
  4. Exchanging Knowledge and Good Practices of Education for Sustainable Development within a Global Student Organization (oikos)
  5. Using measures of reading time regularity (RTR) to quantify eye movement dynamics, and how they are shaped by linguistic information
  6. Mining product configurator data
  7. Deciding between the Covariance Analytical Approach and the Change-Score Approach in Two Wave Panel Data
  8. Legitimation problems of participatory processes in technology assessment and technology policy
  9. Evidence for singlet state β cleavage in the photoreaction of α-(2,6-dimethoxyphenoxy)-acetophenone inferred from time-resolved CIDNP spectroscopy
  10. Ontology-based automatic classification for Web pages
  11. Atomic Animals
  12. ZooKeys, unlocking Earth's incredible biodiversity and building a sustainable bridge into the public domain: From "print-based" to "web-based" taxonomy, systematics, and natural history ZooKeys Editorial Opening Paper
  13. Managing Biodiversity Correctly
  14. Melodías a través del océano
  15. Chapter 9: Particular Remedies for Non-performance: Section 3: Termination of Contract
  16. Investigation of the Controllability of Inductive Power Transmission Systems based on Flexible Coils
  17. EEG frequency tagging evidence of intact social interaction recognition in adults with autism
  18. Enterprise Architecture Management Support for Digital Transformation Projects in Very Large Enterprises
  19. Essential ecosystem service variables for monitoring progress towards sustainability
  20. How and Why Precise Anchors Distinctly Affect Anchor Recipients and Senders
  21. The Integration of Wheelchair Users in Team Handball
  22. Correction to
  23. About the Sense of Useless Software
  24. Existential insecurity and deference to authority