FaST: A linear time stack trace alignment heuristic for crash report deduplication

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

In software projects, applications are often monitored by systems that automatically identify crashes, collect their information into reports, and submit them to developers. Especially in popular applications, such systems tend to generate a large number of crash reports in which a significant portion of them are duplicate. Due to this high submission volume, in practice, the crash report deduplication is supported by devising automatic systems whose efficiency is a critical constraint. In this paper, we focus on improving deduplication system throughput by speeding up the stack trace comparison. In contrast to the state-of-the-art techniques, we propose FaST, a novel sequence alignment method that computes the similarity score between two stack traces in linear time. Our method independently aligns identical frames in two stack traces by means of a simple alignment heuristic. We evaluate FaST and five competing methods on four datasets from open-source projects using ranking and binary metrics. Despite its simplicity, FaST consistently achieves state-of-the-art performance regarding all metrics considered. Moreover, our experiments confirm that FaST is substantially more efficient than methods based on optimal sequence alignment.

Original languageEnglish
Title of host publicationThe 2022 Mining Software Repositories Conference : MSR 2022, Proceedings; 18-20 May 2022, Virtual; 23-24 May 2022, Pittsburgh, Pennsylvania
Number of pages12
Place of PublicationNew York
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date17.10.2022
Pages549-560
ISBN (print)9781665452106
ISBN (electronic)978-1-4503-9303-4
DOIs
Publication statusPublished - 17.10.2022
Event19th International Conference on Mining Software Repositories - MSR 2022 - Pittsburgh, United States
Duration: 23.05.202224.05.2022
Conference number: 19
https://conf.researchr.org/home/msr-2022

Bibliographical note

Titel der Druckausgabe: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR 2022)

Funding Information:
We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson, Ciena, and EffciOS for funding this project. Moreover, this research was enabled in part by the support provided by WestGrid (https://www. westgrid.ca/) and Compute Canada (www.computecanada.ca).

Publisher Copyright:
© 2022 ACM.

    Research areas

  • Automatic Crash Reporting, Crash Report Deduplication, Duplicate Crash Report, Duplicate Crash Report Detection, Stack Trace Similarity
  • Business informatics

DOI

Recently viewed

Publications

  1. Geographical patterns in prediction errors of species distribution models
  2. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  3. Age effects on controlling tools with sensorimotor transformations
  4. A computational study of a model of single-crystal strain-gradient viscoplasticity with an interactive hardening relation
  5. Distinguishing state variability from trait change in longitudinal data
  6. Foundations and applications of computer based material flow networks for einvironmental management
  7. Comments on "Tracking Control of Robotic Manipulators With Uncertain Kinematics and Dynamics"
  8. Analysis of PI controllers with anti-windup techniques on level systems
  9. Artificial Intelligence Algorithms for Collaborative Book Recommender Systems
  10. Appendix A: Design, implementation, and analysis of the iGOES project
  11. ActiveMath - a Learning Platform With Semantic Web Features
  12. Evaluation of Time/Phase Parameters in Frequency Measurements for Inertial Navigation Systems
  13. The Scalable Question Answering Over Linked Data (SQA) Challenge 2018
  14. An expert-based reference list of variables for characterizing and monitoring social-ecological systems
  15. Integration of laser scanning and projection speckle pattern for advanced pipeline monitoring
  16. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  17. Derivative approximation using a discrete dynamic system
  18. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  19. An Orthogonal Wavelet Denoising Algorithm for Surface Images of Atomic Force Microscopy
  20. Efficient and accurate ℓ p-norm multiple kernel learning
  21. A statistical study of the spatial evolution of shock acceleration efficiency for 5 MeV protons and subsequent particle propagation
  22. The Use of Factorization and Multimode Parametric Spectra in Estimating Frequency and Spectral Parameters of Signal
  23. Model inversion using fuzzy neural network with boosting of the solution
  24. Trait correlation network analysis identifies biomass allocation traits and stem specific length as hub traits in herbaceous perennial plants
  25. Some model properties to control a permanent magnet machine using a controlled invariant subspace
  26. Supporting the Decision of the Order Processing Strategy by Using Logistic Models
  27. Optimizing price levels in e-commerce applications with respect to customer lifetime values
  28. Structure and dynamics laboratory testing of an indirectly controlled full variable valve train for camless engines
  29. Data based analysis of order processing strategies to support the positioning between conflicting economic and logistic objectives
  30. Implementing ERP systems in multinational projects
  31. Linux-based Embedded System for Wavelet Denoising and Monitoring of sEMG Signals using an Axiomatic Seminorm
  32. Advances in Dynamics, Optimization and Computation
  33. Optimized neural networks for modeling of loudspeaker directivity diagrams
  34. Applied quality assurance methods under the open source development model
  35. Microstructural development of as-cast AM50 during Constrained Friction Processing: grain refinement and influence of process parameters