A preliminary study on similarity-preserving digital book identifiers

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

  • Klemo Vladimir
  • Marin Silic
  • Nenad Romic
  • Goran Delac
  • Sinisa Srbljic
Due to proliferation of digital publishing, e-book catalogs are abundant but noisy and unstructured. Tools for the digital librarian rely on ISBN, metadata embedded into digital files (without accepted standard) and cryptographic hash functions for the identification of coderivative or nearduplicate content. However, unreliability of metadata and sensitivity of hashing to
even smallest changes prevents efficient detection of coderivative or similar digital books. Focus of the study are books with many versions that differ in certain amount of OCR errors and have a number of sentence-length variations. Identification of similar books is performed using small-sized fingerprints that can be easily shared and compared. We created synthetic datasets to evaluate fingerprinting accuracy while providing standard precision and recall measurements.
OriginalspracheEnglisch
TitelProceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities : LaTeCH 2015
HerausgeberKalliopi A. Zervanou, Marieke van Erp, Beatrice Alex
Anzahl der Seiten6
ErscheinungsortBeijing
VerlagAssociation for Computational Linguistics (ACL)
Erscheinungsdatum2015
Seiten78-83
ISBN (elektronisch)978-1-941643-63-1
PublikationsstatusErschienen - 2015
Veranstaltung9th Socio-Economic Sciences and Humanities Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities - SIGHUM 2015 - Peking, China
Dauer: 26.07.201530.07.2015
Konferenznummer: 9
https://aclanthology.info/volumes/proceedings-of-the-9th-sighum-workshop-on-language-technology-for-cultural-heritage-social-sciences-and-humanities-latech
https://sighum.wordpress.com/events/latech-2015/

Bibliographische Notiz

Publisher Copyright:
© 2015 Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Links

Zuletzt angesehen

Publikationen

  1. Up, up and away: An update on the UK's latest plans for space activities
  2. Anticipated and experienced emotions in environmental risk perception
  3. Institutional investment horizons and firm valuation around the world
  4. Die nichtfinanzielle Erklärung nach dem CSR-Richtlinie-Umsetzungsgesetz
  5. A morte da utopia e sua ressurreição: apontamentos sobre Stefan Andres.
  6. Effects of preactivated mental representations on driving performance
  7. Climate and land use change impacts on plant distributions in Germany
  8. Demokratie und Humanentwicklung: Grundeinsichten für Bildung und Lehre
  9. Web-based depression treatment for type 1 and type 2 diabetic patients
  10. Evaluation beruflicher Kompetenzentwicklung in der Erzieherausbildung
  11. To which gender's disadvantage are school grades biased - girls or boys?
  12. Lufthansa Cargo´s alliance strategy: Spinning a web of stable relations
  13. Guided Internet-delivered cognitive behavioural treatment for insomnia
  14. Advancing psychotherapy and evidence-based psychological interventions
  15. Therapist behaviours in Internet-delivered cognitive behaviour therapy
  16. Oxidation Kinetics of Neat Methyl Oleate and as a Blend with Solketal
  17. Präsidenten und Regierungen in der Vergleichenden Politikwissenschaft
  18. Religion und Säkularismus im antimuslimischen Rassismus der Gegenwart
  19. A Critical Perspective on the Measurement of Social Value Through SROI
  20. Learning and Re-learning from net- based cooperative learning discourses
  21. A review on the use of calcium chloride in applied thermal engineering
  22. Berufsbildung für eine nachhaltige Entwicklung - Offenheit ist Programm
  23. Spillover Effects across Transnational Industrial Relations Agreements
  24. Curriculare Aspekte nachhaltigen Wirtschaftens in der Sekundarstufe II
  25. How do students and teachers deal with mathematical modelling problems?
  26. Der Modellierungskreislauf unter kognitionspsychologischer Perspektive
  27. Prüfung der Nachhaltigkeitsberichterstattung durch den Abschlussprüfer
  28. Rechtliche und institutionelle Rahmenbedingungen der Schulsozialarbeit
  29. Ad Hoc Expert Panels: Regional Fisheries Management Organisations (RFMOs)
  30. Die Steuerberaterprüfung: Unternehmenssteuerrecht und Steuerbilanzrecht
  31. Scattered trees are keystone structures - Implications for conservation
  32. Understanding the properties of isospectral points and pairs in graphs
  33. Investigating habitat-specific plant species pools under climate change
  34. Arbeitsplatzdynamik in den Industriebetrieben in Mecklenburg-Vorpommern
  35. Akteure, Berater und Beobachter, oder: wie kommt Strategie in die Politik?
  36. Deutscher Corporate Governance Kodex 2022 mit Fokus auf Nachhaltigkeit
  37. Affective States and Risky Driving Behavior of Novice and Young Drivers