Predicting fault-prone software modules with rank sum classification

Cahill, Jaspar, Hogan, James M., & Thomas, Richard (2013) Predicting fault-prone software modules with rank sum classification. In Schneider, Jean-Guy & Grant, Doug (Eds.) Proceedings of the 22nd Australian Conference on Software Engineering (ASWEC 2013), IEEE, Melbourne, Victoria, Australia, pp. 211-219.

View at publisher


The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.

Impact and interest:

1 citations in Scopus
Search Google Scholar™
2 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 67093
Item Type: Conference Paper
Refereed: Yes
Additional URLs:
Keywords: Metrics, Fault proness, Machine learning
DOI: 10.1109/ASWEC.2013.33
ISBN: 9780769549958
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 by The Institute of Electrical and Electronics Engineers, Inc.
Deposited On: 10 Feb 2014 00:34
Last Modified: 11 Feb 2014 01:55

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page