Document assignment in multi-site search engines

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
OriginalspracheEnglisch
TitelProceedings of the fourth ACM international conference on Web search and data mining
Anzahl der Seiten10
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2011
Seiten575-584
ISBN (Print)978-1-4503-0493-1
DOIs
PublikationsstatusErschienen - 2011
Extern publiziertJa
Veranstaltung4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Dauer: 09.02.201112.02.2011
Konferenznummer: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

DOI

Zuletzt angesehen

Publikationen

  1. ZooKeys, unlocking Earth's incredible biodiversity and building a sustainable bridge into the public domain: From "print-based" to "web-based" taxonomy, systematics, and natural history ZooKeys Editorial Opening Paper
  2. Multi-Professional Support
  3. Chronic effects of a static stretching intervention program on range of motion and tissue hardness in older adults
  4. One tool to rule? – A field experimental longitudinal study on the costs and benefits of mobile device usage in public agencies
  5. Using EEG movement tagging to isolate brain responses coupled to biological movements
  6. Space Systems Cross-Compatibility
  7. Employing A-B tests for optimizing prices levels in e-commerce applications
  8. Ideas, Complexity, and Innovation
  9. Tuning kalman filter in linear systems
  10. A Comparative Study for Fisheye Image Classification
  11. An intersection test for the cointegrating rank in dependent panel data
  12. An Overview of Electro Hydraulic Full Variable Valve Train Systems to Reduce Emissions in Internal Combustion Engines
  13. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science
  14. Calibration of a simple method for determining ammonia loss in the field
  15. Nonautonomous control of stable and unstable manifolds in two-dimensional flows
  16. Special issue on Variational Pragmatics
  17. Integrating teacher and student workspaces in a technology-enhanced mathematics lecture
  18. Generic functions of railway stations
  19. Lyapunov stability analysis to set up a saturating PI controller with anti-windup for a mass flow system
  20. The measurement time required for determining total NH3 losses after field application of slurries by trail hoses