Document assignment in multi-site search engines
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Authors
Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
Original language | English |
---|---|
Title of host publication | Proceedings of the fourth ACM international conference on Web search and data mining |
Number of pages | 10 |
Place of Publication | New York |
Publisher | Association for Computing Machinery, Inc |
Publication date | 2011 |
Pages | 575-584 |
ISBN (print) | 978-1-4503-0493-1 |
DOIs | |
Publication status | Published - 2011 |
Externally published | Yes |
Event | 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China Duration: 09.02.2011 → 12.02.2011 Conference number: 4 http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf |
- Informatics - Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines
- Business informatics