Document assignment in multi-site search engines

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Document assignment in multi-site search engines. / Brefeld, Ulf; Barla Cambazoglu, B.; Junqueira, Flavio R.
Proceedings of the fourth ACM international conference on Web search and data mining . New York: Association for Computing Machinery, Inc, 2011. p. 575-584.

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Brefeld, U, Barla Cambazoglu, B & Junqueira, FR 2011, Document assignment in multi-site search engines. in Proceedings of the fourth ACM international conference on Web search and data mining . Association for Computing Machinery, Inc, New York, pp. 575-584, 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011, Hong Kong, China, 09.02.11. https://doi.org/10.1145/1935826.1935907

APA

Brefeld, U., Barla Cambazoglu, B., & Junqueira, F. R. (2011). Document assignment in multi-site search engines. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 575-584). Association for Computing Machinery, Inc. https://doi.org/10.1145/1935826.1935907

Vancouver

Brefeld U, Barla Cambazoglu B, Junqueira FR. Document assignment in multi-site search engines. In Proceedings of the fourth ACM international conference on Web search and data mining . New York: Association for Computing Machinery, Inc. 2011. p. 575-584 doi: 10.1145/1935826.1935907

Bibtex

@inbook{38bdb36d4f4f46a3a37ace0416389659,
title = "Document assignment in multi-site search engines",
abstract = "Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.",
keywords = "Informatics, Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines, Business informatics",
author = "Ulf Brefeld and {Barla Cambazoglu}, B. and Junqueira, {Flavio R.}",
year = "2011",
doi = "10.1145/1935826.1935907",
language = "English",
isbn = "978-1-4503-0493-1",
pages = "575--584",
booktitle = "Proceedings of the fourth ACM international conference on Web search and data mining",
publisher = "Association for Computing Machinery, Inc",
address = "United States",
note = "4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011, WSDM '11 2011 ; Conference date: 09-02-2011 Through 12-02-2011",
url = "http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf",

}

RIS

TY - CHAP

T1 - Document assignment in multi-site search engines

AU - Brefeld, Ulf

AU - Barla Cambazoglu, B.

AU - Junqueira, Flavio R.

N1 - Conference code: 4

PY - 2011

Y1 - 2011

N2 - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

AB - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

KW - Informatics

KW - Assignment strategies

KW - Classification

KW - Document access

KW - Document replication

KW - Experimental setup

KW - Geographic location

KW - Multi-site

KW - Multi-site web search engines

KW - Performance improvements

KW - Query forwarding

KW - Query logs

KW - Search results

KW - User query

KW - Web collections

KW - Web page

KW - Web search engines

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=79952426742&partnerID=8YFLogxK

U2 - 10.1145/1935826.1935907

DO - 10.1145/1935826.1935907

M3 - Article in conference proceedings

AN - SCOPUS:79952426742

SN - 978-1-4503-0493-1

SP - 575

EP - 584

BT - Proceedings of the fourth ACM international conference on Web search and data mining

PB - Association for Computing Machinery, Inc

CY - New York

T2 - 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011

Y2 - 9 February 2011 through 12 February 2011

ER -

DOI

Recently viewed

Researchers

  1. Zhiyong Xie

Activities

  1. Wavelets in Technical Applications II
  2. What makes sense and what can be sensed: reconsidering the question of organization
  3. Präsidium (Organisation)
  4. Structure as Infrastructure: Interrelation of Fiber and Construction
  5. Appropriating Digital Transformation: Transformative Ideas about Technology
  6. Sino-German Summer School on Design and data analysis of biodiversity-ecosystem functioning experiments 2011
  7. Towards decent platform work
  8. Legitimizing Digital Transformation within an Incumbent: How Unaccepted Leaders Can Initiate Strategic Changes
  9. Exploring the potential role of priority effects for ecological restoration
  10. How stakeholder characteristics influence the perception and evaluation of CSR communication: a mixed-method approach to communication reception
  11. Comfort and Intervention Behavior of Drivers in Highly Automated Vehicles with Headway Control
  12. Validity of a mathematics test for the selection of university applicants for teacher training
  13. The Public is Fine With Us: Idealized Value Mergers in Top Management
  14. Panel Cointegration Testing with Time Trend and Analysis of Money Demand in OECD Countries
  15. Center for Voting and Parties
  16. Traces of varying task formulations within the texts of students
  17. Inter- and Transdisciplinary Learning
  18. Reden über Generationen
  19. Das Unding. Colonial Gothic in Contemporary Art
  20. A Contingency Analysis of SUccess Definitions - Is Social Entrepreneurship REally Different?
  21. Identification and characterization of coherent behavior in flows
  22. Twitter as a virtual stage. An enactment perspective on co-creative networks

Publications

  1. Emotional text design in multimedia learning
  2. (How) Can didactic research find its way into the classroom? Results from a questionnaire survey on the lesson preparation and continuing professional development of German teachers
  3. Orchestrating distributed data governance in open social innovation
  4. Combining sense of place theory with the ecosystem services concept: empirical insights and reflections from a participatory mapping study
  5. Interplays between relational and instrumental values
  6. A matrix of evaluation and comparsion of Case-Based Reasoning (CBR) software tools to facilitate understanding and appreciation
  7. The Pervasive Power of PowerPoint
  8. Participation in multi-level policy implementation: exploring the influence of governance culture
  9. Knowledge Graph Question Answering Datasets and Their Generalizability
  10. Internet: Impact and Potential for Learning and Instruction
  11. The Open Anchoring Quest Dataset: Anchored Estimates from 96 Studies on Anchoring Effects
  12. The use of a monolithic column to improve the simultaneous determination of caffeine, paracetamol, pseudoephedrine, aspirin, dextromethorphan, chlorpheniramine in pharmaceutical formulations by HPLC-A comparison with a conventional reversed-phase silica-based column
  13. Errors in Working with Office Computers
  14. Toward a lifespan metric of reading fluency
  15. John Howard Yoder
  16. The dynamics of prior entry in serial visual processing
  17. Steering of land use in the context of sustainable development
  18. New descriptions and typifications of syntaxa within the project 'Plant communities of Mecklenburg-Vorpommern and their vulnerability' - Part II
  19. Does isolation affect phenotypic variability and fluctuating asymmetry in the endangered Red Apollo?
  20. Tree cover mediates the effect on rapeseed leaf damage of excluding predatory arthropods, but in an unexpected way
  21. How much can we learn about voluntary climate action from behavior in public goods games?
  22. Actuator- and/or sensor element for sleeve in medical field e.g. limb or joint fracture treatment, has nano-wires comprising nano-fibers, where element deforms and acquires dimensional change of nano-fibers via electrical signal
  23. (Un)Bestimmtheit
  24. Vector Fields Autonomous Control for Assistive Mobile Robots