FaST: A linear time stack trace alignment heuristic for crash report deduplication

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

In software projects, applications are often monitored by systems that automatically identify crashes, collect their information into reports, and submit them to developers. Especially in popular applications, such systems tend to generate a large number of crash reports in which a significant portion of them are duplicate. Due to this high submission volume, in practice, the crash report deduplication is supported by devising automatic systems whose efficiency is a critical constraint. In this paper, we focus on improving deduplication system throughput by speeding up the stack trace comparison. In contrast to the state-of-the-art techniques, we propose FaST, a novel sequence alignment method that computes the similarity score between two stack traces in linear time. Our method independently aligns identical frames in two stack traces by means of a simple alignment heuristic. We evaluate FaST and five competing methods on four datasets from open-source projects using ranking and binary metrics. Despite its simplicity, FaST consistently achieves state-of-the-art performance regarding all metrics considered. Moreover, our experiments confirm that FaST is substantially more efficient than methods based on optimal sequence alignment.

Original languageEnglish
Title of host publicationThe 2022 Mining Software Repositories Conference : MSR 2022, Proceedings; 18-20 May 2022, Virtual; 23-24 May 2022, Pittsburgh, Pennsylvania
Number of pages12
Place of PublicationNew York
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date17.10.2022
Pages549-560
ISBN (print)9781665452106
ISBN (electronic)978-1-4503-9303-4
DOIs
Publication statusPublished - 17.10.2022
Event19th International Conference on Mining Software Repositories - MSR 2022 - Pittsburgh, United States
Duration: 23.05.202224.05.2022
Conference number: 19
https://conf.researchr.org/home/msr-2022

Bibliographical note

Titel der Druckausgabe: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR 2022)

Funding Information:
We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson, Ciena, and EffciOS for funding this project. Moreover, this research was enabled in part by the support provided by WestGrid (https://www. westgrid.ca/) and Compute Canada (www.computecanada.ca).

Publisher Copyright:
© 2022 ACM.

    Research areas

  • Automatic Crash Reporting, Crash Report Deduplication, Duplicate Crash Report, Duplicate Crash Report Detection, Stack Trace Similarity
  • Business informatics

DOI

Recently viewed

Publications

  1. A computational study of a model of single-crystal strain-gradient viscoplasticity with an interactive hardening relation
  2. Predicting the Difficulty of Exercise Items for Dynamic Difficulty Adaptation in Adaptive Language Tutoring
  3. Lyapunov Convergence Analysis for Asymptotic Tracking Using Forward and Backward Euler Approximation of Discrete Differential Equations
  4. Distinguishing state variability from trait change in longitudinal data
  5. Return of Fibonacci random walks
  6. A Switching Cascade Sliding PID-PID Controllers Combined with a Feedforward and an MPC for an Actuator in Camless Internal Combustion Engines
  7. Appendix A: Design, implementation, and analysis of the iGOES project
  8. Evaluation of Time/Phase Parameters in Frequency Measurements for Inertial Navigation Systems
  9. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  10. Dynamic environment modelling and prediction for autonomous systems
  11. An expert-based reference list of variables for characterizing and monitoring social-ecological systems
  12. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  13. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  14. For a return to the forgotten formula: 'Data 1 + Data 2 > Data 1'
  15. Efficient and accurate ℓ p-norm multiple kernel learning
  16. Cognitive Predictors of Child Second Language Comprehension and Syntactic Learning
  17. Optimizing price levels in e-commerce applications with respect to customer lifetime values
  18. Implementing ERP systems in multinational projects
  19. Efficient Order Picking Methods in Robotic Mobile Fulfillment Systems
  20. Mathematics in Robot Control for Theoretical and Applied Problems
  21. Linux-based Embedded System for Wavelet Denoising and Monitoring of sEMG Signals using an Axiomatic Seminorm
  22. Advances in Dynamics, Optimization and Computation
  23. Control of the inverse pendulum based on sliding mode and model predictive control
  24. A Cross-Classified CFA-MTMM Model for Structurally Different and Nonindependent Interchangeable Methods
  25. Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics
  26. Experiments on the Fehrer-Raab effect and the ‘Weather Station Model’ of visual backward masking
  27. PI and Fuzzy Controllers for Non-Linear Systems
  28. Applications of the Simultaneous Modular Approach in the Field of Material Flow Analysis
  29. TRY plant trait database – enhanced coverage and open access
  30. Second language learners' performance in mathematics
  31. Semantic Evaluation Services for Web-Based Exercises
  32. Switching between reading tasks leads to phase-transitions in reading times in L1 and L2 readers
  33. Towards an open question answering architecture
  34. Multilayer neural networks