Document assignment in multi-site search engines

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
OriginalspracheEnglisch
TitelProceedings of the fourth ACM international conference on Web search and data mining
Anzahl der Seiten10
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2011
Seiten575-584
ISBN (Print)978-1-4503-0493-1
DOIs
PublikationsstatusErschienen - 2011
Extern publiziertJa
Veranstaltung4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Dauer: 09.02.201112.02.2011
Konferenznummer: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

DOI

Zuletzt angesehen

Publikationen

  1. One tool to rule? – A field experimental longitudinal study on the costs and benefits of mobile device usage in public agencies
  2. Using EEG movement tagging to isolate brain responses coupled to biological movements
  3. Enterprise Architecture Management Support for Digital Transformation Projects in Very Large Enterprises
  4. Cognitive performance limitations in operating rooms
  5. Discrete Lyapunov Controllers for an Actuator in Camless Engines
  6. Rapid Prototyping of a Mechatronic Engine Valve Controller for IC Engines
  7. An intersection test for the cointegrating rank in dependent panel data
  8. Improve a 3D distance measurement accuracy in stereo vision systems using optimization methods’ approach
  9. Sensorimotor Control and Proprioception in Neurorehabilitation
  10. Double-fading support - A training approach to complex software systems
  11. Deconstructing the Theoretical Language of Process Research
  12. Foreword to applied data science, demo, and nectar tracks
  13. An introductional lecture on chaotic systems through Lorenz attractor and forced Lotka Volterra equation for interdisciplinary education
  14. Knowledge Generation and Sustainable Development
  15. Integrating teacher and student workspaces in a technology-enhanced mathematics lecture
  16. Consensus statement on defining and measuring negative effects of Internet interventions
  17. Generic functions of railway stations
  18. Controlling a Bank Model Economy by Sliding Mode Control with Help of Kalman Filter
  19. Vertical Dynamics Description and its Control in the Presence of Nonlinear Friction
  20. The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing
  21. The relationship between values and knowledge in visioning for landscape management
  22. Health State Valuation Methods and Reference Points
  23. Experimental Verification of the Impact of Radial Internal Clearance on a Bearing's Dynamics
  24. Explorations in social spaces
  25. Organizational practices for the aging workforce
  26. “Circuits of Commons”: Exploring the Connections Between Economic Lives and the Commons