TraceSim: An Alignment Method for Computing Stack Trace Similarity

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Irving Muller Rodrigues
  • Aleksandr Khvorov
  • Daniel Aloise
  • Roman Vasiliev
  • Dmitrij Koznov
  • Eraldo Rezende Fernandes
  • George Chernishev
  • Dmitry Luciv
  • Nikita Povarov

Software systems can automatically submit crash reports to a repository for investigation when program failures occur. A significant portion of these crash reports are duplicate, i.e., they are caused by the same software issue. Therefore, if the volume of submitted reports is very large, automatic grouping of duplicate crash reports can significantly ease and speed up analysis of software failures. This task is known as crash report deduplication. Given a huge volume of incoming reports, increasing quality of deduplication is an important task. The majority of studies address it via information retrieval or sequence matching methods based on the similarity of stack traces from two crash reports. While information retrieval methods disregard the position of a frame in a stack trace, the existing works based on sequence matching algorithms do not fully consider subroutine global frequency and unmatched frames. Besides, due to data distribution differences among software projects, parameters that are learned using machine learning algorithms are necessary to provide more flexibility to the methods. In this paper, we propose TraceSim – an approach for crash report deduplication which combines TF-IDF, optimum global alignment, and machine learning (ML) in a novel way. Moreover, we propose a new evaluation methodology for this task that is more comprehensive and robust than previously used evaluation approaches. TraceSim significantly outperforms seven baselines and state-of-the-art methods in the majority of the scenarios. It is the only approach that achieves competitive results on all datasets regarding all considered metrics. Moreover, we conduct an extensive ablation study that demonstrates the importance of each TraceSim’s element to its final performance and robustness. Finally, we provide the source code for all considered methods and evaluation methodology as well as the created datasets.

OriginalspracheEnglisch
Aufsatznummer53
ZeitschriftEmpirical Software Engineering
Jahrgang27
Ausgabenummer2
Anzahl der Seiten41
ISSN1382-3256
DOIs
PublikationsstatusErschienen - 01.03.2022
Extern publiziertJa

DOI

Zuletzt angesehen

Publikationen

  1. Zwischen Selbstvermarktung und Subversion. Das Web 2.0 und seine Subjekte
  2. ‚Ich sehe was, was Du nicht siehst‘
  3. Strategische Organisation:
  4. Mobility, Media, and the Experiences of Airbnb’s Aesthetic Regime
  5. Strukturwandel oder Kommunikationsrevolution?
  6. Collaboration Benefits in Port Hinterland Transportation
  7. Stoffstromnetzbasierte Planung und Optimierung komplexer Produktionssysteme
  8. Trading Zones of Climate Change
  9. Im Gespräch bleiben. Notizen zur Kunstkritik
  10. Research in-between
  11. Eigenstrain method in simulations of laser peen forming of curved surfaces
  12. Consensus Vs. Dissensus
  13. The impact of age and experience on expatriate outcomes
  14. Gender and Diversity aspects in Engineering Education and their impact on the design of engineering curricula
  15. 3. Methoden-Muster: Austausch, Koordination, Abstimmung
  16. Zootechnologies
  17. Das endlos bewegte Buch
  18. 2. Methoden-Muster: Gestaltung von Beziehungen, Kontaktpflege, Klima
  19. Aí é orixá!
  20. Einstein und die Religion
  21. Writing Life, Writing Back, and Writing Through
  22. Sliding Mode Control and Observer-Based Disturbance Compensation for a Permanent Magnet Linear Motor
  23. Der Noth gehorchend, nicht dem eignen Trieb
  24. Kinetics and mechanism of the oxidation of dimethylsulfoxide (DMSO) and methanesulfinate (MSI-) by OH radicals in aqueous medium
  25. Integrating food security and biodiversity governance
  26. Piloting a carbon emissions audit for an international arts festival under tight resource constraints
  27. Technik, Medium, Kommunikation Zur semeiotischen Struktur des Blogs
  28. Frösche, Vögel und Menschen
  29. Handball gemeinsam gestalten
  30. Was ist „evangelischer Widerstand“?
  31. Das Parteiensystem Hamburgs
  32. Identification of polybrominated debenzofurans from photolysis of decabromdiphenylether by uv spectroscopy
  33. Reputation und Reiseentscheidung im Internet
  34. Markteintritte, Marktaustritte und Produktivität