A preliminary study on similarity-preserving digital book identifiers

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

  • Klemo Vladimir
  • Marin Silic
  • Nenad Romic
  • Goran Delac
  • Sinisa Srbljic
Due to proliferation of digital publishing, e-book catalogs are abundant but noisy and unstructured. Tools for the digital librarian rely on ISBN, metadata embedded into digital files (without accepted standard) and cryptographic hash functions for the identification of coderivative or nearduplicate content. However, unreliability of metadata and sensitivity of hashing to
even smallest changes prevents efficient detection of coderivative or similar digital books. Focus of the study are books with many versions that differ in certain amount of OCR errors and have a number of sentence-length variations. Identification of similar books is performed using small-sized fingerprints that can be easily shared and compared. We created synthetic datasets to evaluate fingerprinting accuracy while providing standard precision and recall measurements.
OriginalspracheEnglisch
TitelProceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities : LaTeCH 2015
HerausgeberKalliopi A. Zervanou, Marieke van Erp, Beatrice Alex
Anzahl der Seiten6
ErscheinungsortBeijing
VerlagAssociation for Computational Linguistics (ACL)
Erscheinungsdatum2015
Seiten78-83
ISBN (elektronisch)978-1-941643-63-1
PublikationsstatusErschienen - 2015
Veranstaltung9th Socio-Economic Sciences and Humanities Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities - SIGHUM 2015 - Peking, China
Dauer: 26.07.201530.07.2015
Konferenznummer: 9
https://aclanthology.info/volumes/proceedings-of-the-9th-sighum-workshop-on-language-technology-for-cultural-heritage-social-sciences-and-humanities-latech
https://sighum.wordpress.com/events/latech-2015/

Bibliographische Notiz

Publisher Copyright:
© 2015 Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Links

Zuletzt angesehen

Publikationen

  1. Career adaptability development in adolescence
  2. Atkinson, Anthony B. Inequality. What Can Be Done? Cambridge/Massachusetts. Harvard University Press 2015
  3. Management of Biodiversity in Protected Areas with Sustainability Control
  4. Ecosystem services between sustainability and efficiency
  5. ’... the world was becomming numerical.’
  6. Jungfräuliche Membranen
  7. Gender makes the difference
  8. Cross-National Complementarity of Technology Push, Demand Pull, and Manufacturing Push Policies
  9. Der Inverted Classroom in der Politikwissenschaft
  10. Artikel 39 EUV [Datenschutz]
  11. Per un'etica della distruzion
  12. (Re)Produktivität: der "blinde Fleck" im Diskurs zu Nachhaltiger Entwicklung
  13. The interplay between individual and collective efforts in the age of global threats
  14. Effects of training on employee suggestions and promotions in an internal labor market
  15. Karl Mays "El Sendador"
  16. Einleitung
  17. Socio-technical change linking expectations and representations
  18. Controlling consent
  19. Teachers’ assessment competence
  20. Kontextanalyse
  21. Adaptation knowledge for New Zealand's primary industries: Known, not known and needed
  22. Member States as 'Rambos' in EU Asylum Politics
  23. Students' conceptions about the sense of smell
  24. Complexity in Water Management and Governance
  25. Coffee management and the conservation of forest bird diversity in southwestern Ethiopia
  26. Versionen von "Wuthering Heights"
  27. § 60 Republik Korea (Südkorea)
  28. Why and How to adopt Green Management into Business Organizations?
  29. § 58 Taiwan
  30. Tackling the habitat fragmentation panchreston
  31. Analphabetismus, funktionaler
  32. PharmCycle
  33. What About Us