Document assignment in multi-site search engines

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

Original languageEnglish
Title of host publicationProceedings of the fourth ACM international conference on Web search and data mining
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2011
Pages575-584
ISBN (print)978-1-4503-0493-1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Duration: 09.02.201112.02.2011
Conference number: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

    Research areas

  • Informatics - Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines
  • Business informatics

DOI

Recently viewed

Activities

  1. Presenting paper 'Writing Organization Atmospherically'
  2. Acceleration and Reflection
  3. Effects of a seminar on mathematical modelling with MathCityMap
  4. Academy of Management Annual Meeting 2023
  5. Management Control in Supply Chain Management: A Concept and first Empirical Insights
  6. Media and Migration: An Introduction and two case studies
  7. BBC Fusion Summit: Playful interfaces for playful human beings: the future of game interfaces
  8. What do we educate for? Critical thinking and reflection as key concepts for a contemporary higher education
  9. “Will I look have I something?” Pragmatic variation across the Englishes
  10. New Work in Queer Studies
  11. 1st Global Conference on Research Integration and Implementation - i2S 2013
  12. International Convention of Psychological Science 2017
  13. Prototypes: The Usefulf Ambiguity of the „Biological Computer" (Annual Meeting of the AMERICAN SOCIETY FOR CYBERNETICS)
  14. Provenance as (Linked) Data
  15. Workshop - pre-ICIS IFIP WG 8.2 OASIS Workshop on Criticality and Values in Digital Transformation Research
  16. 12th EIASM Conference on Performance Measurement and Management Control - 2023
  17. Integrating Time Aspects into the Assessment of Sustainable Resource Management
  18. Towards a sustainable Southern Transylvania: Recognizing existing contributions to reach sustainable visions and empowering stakeholders
  19. Re-thinking Relationality in the Sociotechnological Condition
  20. Positiver Aktionismus
  21. A multi-criteria decision model for selecting a portfolio of sustainable phosphorus management strategies in different regions

Publications

  1. Vergütung, variable
  2. Use of design methods, team leaders' goal orientation, and team effectiveness: A follow-up study in software development projects
  3. Automated scoring in the era of artificial intelligence
  4. Supporting Visual and Verbal Learning Preferences in a Second-Language Multimedia Learning Environment
  5. Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge
  6. Using Daily Stretching to Counteract Performance Decreases as a Result of Reduced Physical Activity—A Controlled Trial
  7. Challenging the status quo of accelerator research: Concluding remarks
  8. A Besov space mapping property for the double layer potential on polygons
  9. Nonautonomous control of stable and unstable manifolds in two-dimensional flows
  10. Consensus statement on defining and measuring negative effects of Internet interventions
  11. Multifractality Versus (Mono-) Fractality as Evidence of Nonlinear Interactions Across Timescales
  12. Ensuring the Long-Term Provision of Heathland Ecosystem Services—The Importance of a Functional Perspective in Management Decision Frameworks
  13. Disentangling trade-offs and synergies around ecosystem services with the influence network framework
  14. Fruit Detection and Yield Mass Estimation from a UAV Based RGB Dense Cloud for an Apple Orchard
  15. Error handling in office work with computers
  16. Comparison of an Electrochemical and Luminescence-Based Oxygen Measuring System for Use in the Biodegradability Testing According to Closed Bottle Test (OECD 301D)
  17. Dimensions, dialectic, discourse
  18. Synthesis and future research directions linking tree diversity to growth, survival, and damage in a global network of tree diversity experiments
  19. A PD Fuzzy Control of a Nonholonomic Car-Like Robot for Drive Assistant Systems
  20. Maschinenbelegungsplanung mit evolutionären Algorithmen
  21. Time for the Environment: The Tutzing Time Ecology Project
  22. Mathematik als Fremdsprache?
  23. Papers from the 10th Lancaster University Postgraduate Conference in Linguistics and Language Teaching 2015
  24. Conceptualizing sustainable consumption
  25. Negotiating boundaries through reality shows
  26. A Theory-Based Concept for Fostering Sustainability Competencies in Engineering Programs
  27. "to expose, to show, to demonstrate, to inform, to offer. Artistic Practices around 1990"
  28. The development of an eco-label for software products