Document assignment in multi-site search engines

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

Original languageEnglish
Title of host publicationProceedings of the fourth ACM international conference on Web search and data mining
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2011
Pages575-584
ISBN (print)978-1-4503-0493-1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Duration: 09.02.201112.02.2011
Conference number: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

    Research areas

  • Informatics - Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines
  • Business informatics

DOI

Recently viewed

Researchers

  1. Simon Norris

Publications

  1. Biodiversity in space and time - towards a grid mapping for Mongolia
  2. Mapping the Order of New Migration
  3. Shepherds’ local knowledge and scientific data on the scavenging ecosystem service
  4. From teacher-centered instruction to peer tutoring in the heterogeneous international classroom
  5. Theoretical Practices
  6. The use of pseudo-causal narratives in EU policies
  7. Points of cooperation: integrating cooperative learning into web-based courses
  8. The effect of yield surface curvature change by cross hardening on forming limit diagrams of sheets
  9. Statistical methods for the evaluation of hydrological parameters for landuse planning
  10. The Automated will
  11. Predator diversity and abundance provide little support for the enemies hypothesis in forests of high tree diversity
  12. Optimising patterns of life conduct
  13. A black box identification in frequency domain
  14. The Potential of AutoML for Demand Forecasting
  15. Congruence is not everything
  16. Atmospheric gas-particle partitioning versus gaseous/particle-bound deposition of SVOCs
  17. Wie partizipativ sind Bottom-up-Transformationen?
  18. Research and Development as a Competence Creating Business in a Business
  19. Working time dimensions and well-being
  20. Numerical dynamic simulation and analysis of a lithium bromide/water long term solar heat storage system
  21. Dogmatics
  22. Measuring mathematics competence in international and national large scale assessments
  23. Phenotypic Plasticity Explains Response Patterns of European Beech (Fagus sylvatica L.) Saplings to Nitrogen Fertilization and Drought Events
  24. Tailoring of residual stresses by specific use of defined prestress during laser shock peening
  25. Bifurcation loads of circular curved beams of glued-laminated timber with continuous lateral support