Document assignment in multi-site search engines
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Proceedings of the fourth ACM international conference on Web search and data mining . New York: Association for Computing Machinery, Inc, 2011. p. 575-584.
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Document assignment in multi-site search engines
AU - Brefeld, Ulf
AU - Barla Cambazoglu, B.
AU - Junqueira, Flavio R.
N1 - Conference code: 4
PY - 2011
Y1 - 2011
N2 - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
AB - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
KW - Informatics
KW - Assignment strategies
KW - Classification
KW - Document access
KW - Document replication
KW - Experimental setup
KW - Geographic location
KW - Multi-site
KW - Multi-site web search engines
KW - Performance improvements
KW - Query forwarding
KW - Query logs
KW - Search results
KW - User query
KW - Web collections
KW - Web page
KW - Web search engines
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=79952426742&partnerID=8YFLogxK
U2 - 10.1145/1935826.1935907
DO - 10.1145/1935826.1935907
M3 - Article in conference proceedings
AN - SCOPUS:79952426742
SN - 978-1-4503-0493-1
SP - 575
EP - 584
BT - Proceedings of the fourth ACM international conference on Web search and data mining
PB - Association for Computing Machinery, Inc
CY - New York
T2 - 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011
Y2 - 9 February 2011 through 12 February 2011
ER -