Document assignment in multi-site search engines

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

Original languageEnglish
Title of host publicationProceedings of the fourth ACM international conference on Web search and data mining
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2011
Pages575-584
ISBN (print)978-1-4503-0493-1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Duration: 09.02.201112.02.2011
Conference number: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

    Research areas

  • Informatics - Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines
  • Business informatics

DOI

Recently viewed

Publications

  1. Enhancing EFL classroom instruction via the FeedBook: effects on language development and communicative language use.
  2. A Two-Stage Augmented Extended Kalman Filter as an Observer for Sensorless Valve Control in Camless Internal Combustion Engines
  3. Nonlinear PD fault-tolerant control for dynamic positioning of ships with actuator constraints
  4. Value Structure and Dimensions
  5. Predicting the Individual Mood Level based on Diary Data
  6. Robust Control of Excavation Mobile Robot with Dynamic Triangulation Vision
  7. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  8. Measuring cognitive load with subjective rating scales during problem solving
  9. The temporal pattern of creativity and implementation in teams
  10. Governing Objects from a Distance
  11. Strengthening the transformative impulse while mainstreaming real-world labs: Lessons learned from three years of BaWü-Labs
  12. Noninteracting optimal and adaptive torque control using an online parameter estimation with help of polynomials in EKF for a PMSM
  13. Metaphors and Paradigms of the Language Animal—or—The Advantage of seeing “Time Is a Resource” as a Paradigm
  14. Challenges for biodiversity monitoring using citizen science in transitioning social-ecological systems
  15. Does thinking-aloud affect learning, visual information processing and cognitive load when learning with seductive details as expected from self-regulation perspective?
  16. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  17. The Creation of the Concept through the Interaction of Philosophy with Science and Art
  18. Cost effectiveness of guided Internet-based interventions for depression in comparison with control conditions
  19. Using corpus-linguistic methods to track longitudinal development
  20. Learning shortest paths in word graphs
  21. An intersection test for the cointegrating rank in dependent panel data
  22. Distributable Modular Software Framework for Manufacturing Systems
  23. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  24. A Geometric Approach by Using Switching and Flatness Based Control in Electromechanical Actuators for Linear Motion