Document assignment in multi-site search engines

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.

Original languageEnglish
Title of host publicationProceedings of the fourth ACM international conference on Web search and data mining
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2011
Pages575-584
ISBN (print)978-1-4503-0493-1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining - WSDM '11 2011 - Hong Kong, China
Duration: 09.02.201112.02.2011
Conference number: 4
http://www.wsdm2011.org/wsdm2011/_media/wsdm2011-program-20110127.pdf

    Research areas

  • Informatics - Assignment strategies, Classification, Document access, Document replication, Experimental setup, Geographic location, Multi-site, Multi-site web search engines, Performance improvements, Query forwarding, Query logs, Search results, User query, Web collections, Web page, Web search engines
  • Business informatics

DOI

Recently viewed

Publications

  1. Detection time analysis of propulsion system fault effects in a hexacopter
  2. On the utility of indirect methods for detecting faking
  3. On the origin of passive rotation in rotational joints, and how to calculate it
  4. Homogenization methods for multi-phase elastic composites with non-elliptical reinforcements
  5. Mining Implications From Data
  6. Early Detection of Faillure in Conveyor Chain Systems by Wireless Sensor Node
  7. Trait-based approaches to analyze links between the drivers of change and ecosystem services
  8. Design, Modeling and Control of an Over-actuated Hexacopter Tilt-Rotor
  9. Robust Control of Excavation Mobile Robot with Dynamic Triangulation Vision
  10. Optimal dynamic scale and structure of a multi-pollution economy
  11. An error management perspective on audit quality
  12. A high-resolution approach for the spatiotemporal analysis of forest canopy space using terrestrial laser scanning data
  13. Obstacle Coordinates Transformation from TVS Body-Frame to AGV Navigation-Frame
  14. Impulsive Feedback Linearization for Decoupling of a Constant Disturbance with Low Relative Degree to Control Maglev Systems
  15. A Sliding Mode Control with a Bang-Bang Observer for Detection of Particle Pollution
  16. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  17. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  18. The Impact of AGVs and Priority Rules in a Real Production Setup – A Simulation Study
  19. Performance of process-based models for simulation of grain N in crop rotations across Europe
  20. A Control of an Electromagnetic Actuator Using Model Predictive Control
  21. Passive Rotation Compensation in Parallel Kinematics Using Quaternions
  22. Educational reconstruction as model for the theory-based design of student-centered learning environments in electrical engineering courses
  23. An isomorphism between polynomial eigenfunctions of the transfer operator and the Eichler cohomology for modular groups
  24. A geometric approach for the design and control of an electromagnetic actuator to optimize its dynamic performance
  25. Machine vision system errors for unmanned aerial vehicle navigation
  26. Modernizing persistence–bioaccumulation–toxicity (PBT) assessment with high throughput animal-free methods
  27. Factor structure and measurement invariance of the Students’ Self-report Checklist of Social and Learning Behaviour (SSL)
  28. A Structure and Content Prompt-based Method for Knowledge Graph Question Answering over Scholarly Data
  29. Simple relay non-linear PD control for faster and high-precision motion systems with friction
  30. Controlling a Bank Model Economy by Using an Adaptive Model Predictive Control with Help of an Extended Kalman Filter
  31. Reading Comprehension as Embodied Action: Exploratory Findings on Nonlinear Eye Movement Dynamics and Comprehension of Scientific Texts
  32. WHICH ESTIMATION SITUATIONS ARE RELEVANT FOR A VALID ASSESSMENT OF MEASUREMENT ESTIMATION SKILLS
  33. Individual Scans Fusion in Virtual Knowledge Base for Navigation of Mobile Robotic Group with 3D TVS
  34. DISKNET – A Platform for the Systematic Accumulation of Knowledge in IS Research
  35. Image compression based on periodic principal components
  36. On the computation of the warping function and the torsional properties of thin-walled crosssections of prismatic beams
  37. Within-individual leaf trait variation increases with phenotypic integration in a subtropical tree diversity experiment
  38. Proxy Indicators for the Quality of Open-domain Dialogues
  39. A Comparative Study for Fisheye Image Classification
  40. Functional Richness and Relative Resilience of Bird Communities in Regions with Different Land Use Intensities
  41. Pressure fault recognition and compensation with an adaptive feedforward regulator in a controlled hybrid actuator within engine applications
  42. Masked Autoencoder Pretraining for Event Classification in Elite Soccer
  43. A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods
  44. Input-Output Linearization of a Thermoelectric Cooler for an Ice Clamping System Using a Dual Extended Kalman Filter
  45. Control oriented modeling of DCDC converters
  46. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science
  47. Linear free vibrations with uncertain initial conditions
  48. Material flow analysis between dynamic modelling and life cycle assessment