Document assignment in multi-site search engines

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
OriginalspracheEnglisch
TitelProceedings of the fourth ACM international conference on Web search and data mining
Anzahl der Seiten10
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2011
Seiten575-584
ISBN (Print)978-1-4503-0493-1
DOIs
PublikationsstatusErschienen - 2011
Extern publiziertJa
Veranstaltung4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Dauer: 09.02.201112.02.2011
Konferenznummer: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

DOI

Zuletzt angesehen

Publikationen

  1. Microstructural and Mechanical Aspects of Reinforcement Welds for Lightweight Components Produced by Friction Hydro Pillar Processing
  2. Decoding evidence-based entrepreneurship
  3. The relationship between resilience and sustainable development of ecological-economic systems
  4. Advanced extrusion processes
  5. (Un)Bestimmtheit
  6. The Impact of Mental Fatigue on Exploration in a Complex Computer Task
  7. Atlas mit CD-ROM
  8. Extension of SEIR compartmental models for constructive Lyapunov control of COVID-19 and analysis in terms of practical stability
  9. Fast, curvature-based prediction of rolling forces for porous media based on a series of detailed simulations
  10. Extrusion Benchmark 2009 – A Step Ahead in Virtual Process Optimization
  11. Repeated sampling detects gene flow in a flightless ground beetle in a fragmented landscape
  12. The disproportionate value of scattered trees
  13. More Evidence for Three Types of Cognitive Style
  14. Geometric control tools for robotic manipulators
  15. On the Question of Blockchain Activism
  16. Comparative study of microstructure and texture of cast and homogenized TX32 magnesium alloy after hot deformation
  17. High temperature mechanical behavior of an extruded Mg-11Gd-4.5Y-1Nd-1.5Zn-0.5Zr (wt%) alloy
  18. The value of sub-national data
  19. Pesticide and metabolite fate, release and transport modelling at catchment scale
  20. iTaukei ways of knowing and managing mangroves for ecosystem-based adaptation
  21. Introducing the MusicLab Copenhagen Dataset
  22. In situ investigation of microstructure evolution during solidification of Mg10CaxGd (x = 5, 10, 20) alloys
  23. Ge-/Beschriebenes Gesicht
  24. Predicting the future performance of soccer players
  25. Kunst
  26. Devianzmanagement
  27. Overview of a Proposed Ecological Risk Assessment Process for Honey bees (Apis mellifera) and Non‐Apis Bees
  28. SH-CoDE: Scholarly Hybrid Complex Question Decomposition and Execution
  29. "Man muss über sich selbst schreiben"
  30. Organizing for innovation through accelerators: Concluding remarks