Document assignment in multi-site search engines

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

Document assignment in multi-site search engines. / Brefeld, Ulf; Barla Cambazoglu, B.; Junqueira, Flavio R.
Proceedings of the fourth ACM international conference on Web search and data mining . New York: Association for Computing Machinery, Inc, 2011. S. 575-584.

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Brefeld, U, Barla Cambazoglu, B & Junqueira, FR 2011, Document assignment in multi-site search engines. in Proceedings of the fourth ACM international conference on Web search and data mining . Association for Computing Machinery, Inc, New York, S. 575-584, 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011, Hong Kong, China, 09.02.11. https://doi.org/10.1145/1935826.1935907

APA

Brefeld, U., Barla Cambazoglu, B., & Junqueira, F. R. (2011). Document assignment in multi-site search engines. In Proceedings of the fourth ACM international conference on Web search and data mining (S. 575-584). Association for Computing Machinery, Inc. https://doi.org/10.1145/1935826.1935907

Vancouver

Brefeld U, Barla Cambazoglu B, Junqueira FR. Document assignment in multi-site search engines. in Proceedings of the fourth ACM international conference on Web search and data mining . New York: Association for Computing Machinery, Inc. 2011. S. 575-584 doi: 10.1145/1935826.1935907

Bibtex

@inbook{38bdb36d4f4f46a3a37ace0416389659,
title = "Document assignment in multi-site search engines",
abstract = "Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.",
keywords = "Informatics, Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines, Business informatics",
author = "Ulf Brefeld and {Barla Cambazoglu}, B. and Junqueira, {Flavio R.}",
year = "2011",
doi = "10.1145/1935826.1935907",
language = "English",
isbn = "978-1-4503-0493-1",
pages = "575--584",
booktitle = "Proceedings of the fourth ACM international conference on Web search and data mining",
publisher = "Association for Computing Machinery, Inc",
address = "United States",
note = "4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011, WSDM '11 2011 ; Conference date: 09-02-2011 Through 12-02-2011",
url = "http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf",

}

RIS

TY - CHAP

T1 - Document assignment in multi-site search engines

AU - Brefeld, Ulf

AU - Barla Cambazoglu, B.

AU - Junqueira, Flavio R.

N1 - Conference code: 4

PY - 2011

Y1 - 2011

N2 - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

AB - Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

KW - Informatics

KW - Assignment strategies

KW - Classification

KW - Document access

KW - Document replication

KW - Experimental setup

KW - Geographic location

KW - Multi-site

KW - Multi-site web search engines

KW - Performance improvements

KW - Query forwarding

KW - Query logs

KW - Search results

KW - User query

KW - Web collections

KW - Web page

KW - Web search engines

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=79952426742&partnerID=8YFLogxK

U2 - 10.1145/1935826.1935907

DO - 10.1145/1935826.1935907

M3 - Article in conference proceedings

AN - SCOPUS:79952426742

SN - 978-1-4503-0493-1

SP - 575

EP - 584

BT - Proceedings of the fourth ACM international conference on Web search and data mining

PB - Association for Computing Machinery, Inc

CY - New York

T2 - 4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011

Y2 - 9 February 2011 through 12 February 2011

ER -

DOI

Zuletzt angesehen

Forschende

  1. Aviva Brin

Publikationen

  1. Introduction
  2. Governmental activity, integration, and agglomeration
  3. "Helden des Alltags"
  4. Development of competencies across the life span
  5. Fallstudie
  6. Modeling of sheet metals with coarse texture via crystal plasticity
  7. Effects of welding conditions on microstructural transformations and mechanical properties in AE42-HP friction welded joints
  8. Article 2 (4)
  9. Filter Devices having a Microwave Resonator
  10. Utilization of phenolic compounds by microalgae
  11. The new US horizontal merger guidelines
  12. "Are we in sync with each other?" Exploring the effects of cosleeping on heterosexual couplestextquotesingle sleep using simultaneous polysomnography
  13. Endogeneity in the relation between poverty, wealth and life satisfaction
  14. Super-GAU und Computersimulation
  15. The Role of Trust in Natural Resource Management Conflicts
  16. Ultrafast cognition
  17. Application of Adaptive Element-Free Galerkin Method to Simulate Friction Stir Welding of Aluminum
  18. There Is No Alternative (TINA)
  19. Measurement of ammonia emissions in multi-plot field experiments
  20. When mortality knocks
  21. Die Reflexion von Lehrerhandeln anstoßen
  22. Technology and the spiritual
  23. Einleitung
  24. Sozial orientiertes Informationsmanagement
  25. Plea Bargaining/Settlement of Cartel Cases
  26. Schöne Bescherung
  27. § 58 Taiwan
  28. "República"
  29. Economic Impacts of Climate Change
  30. Truth in testimony:
  31. Mittsommerfeuer
  32. Intellektuelle und Kritik in Medienkulturen
  33. A Transcultural Approach to Art History through the Lens of its First International Conferences,
  34. Jenseits von Ressourcen
  35. Manchester - Eine Stadt erfindet sich neu
  36. Globale und nationale First-Mover-Vorteile internetbasierter Geschäftsmodelle
  37. Leseförderung im Schul- und Unterrichtsalltag implementieren
  38. Trophic ecology of parabiotic ants: Do the partners have similar food niches?
  39. Corrigendum to "What drives policy decision-making related to species conservation?" [Biol. Conserv. 142 (2010) 1370-1380]
  40. Il dibattito sull'immagine a partire dall'Iconic Turn
  41. Methoden inter- und transdisziplinären Arbeitens
  42. Between Ostrom and Nordhaus
  43. Biodiversität
  44. The role of multi-functionality in social preferences toward semi-arid rural landscapes
  45. Kryptische Blicke eines extraordinären Paares