Using Logistic Regression to Estimate the Number of Faulty Software Modules

Morasca, Sandro

Background. The evaluation of the accuracy of an estimation model for software fault-proneness is carried out by using the model with data collected on a set of software modules and classifying the modules in the set as either estimated faulty or estimated non-faulty. This classification usually involves setting a fault-proneness threshold: software modules whose fault-proneness is above that threshold are classified as estimated faulty and the others as estimated non-faulty. The selection of the threshold value is to some extent subjective and arbitrary, and different threshold values may lead to very different results in terms of classification accuracy. Objective. With our proposal, the accuracy of a fault-proneness model can be evaluated without fixing a threshold. Method. We first derive a property of Binary Logistic Regression fault-proneness estimation models. We show that the number of actually faulty software modules in the training set used to build a model is equal to the number of modules estimated faulty in that set, i.e., estimation is perfect on the training set. Then, we use the model on a different set, the test set, and estimate the number of faulty modules. We also estimate the number of faulty modules in the test set by using a more conventional approach with five different fault-proneness thresholds, and we finally compare the estimates with the estimates obtained via our approach. We carried out the empirical validation on a data set from NASA hosted on the PROMISE repository, by using a technique similar to the one used in K-fold cross validation. Results. In the empirical validation we carried out, the approach we propose is able to estimate the number of faulty modules in the test sets better than the threshold-based ones, in a statistically significant way. Conclusions. Our approach seems to have the potential to be practically used to accurately estimate the number of faulty modules without having to set specific fault-proneness thresholds.