Background. The evaluation of the accuracy of an estimation model for software fault-proneness is carried out by using the model with data collected on a set of software modules and classifying the modules in the set as either estimated faulty or estimated non-faulty. This classification usually involves setting a fault-proneness threshold: software modules whose fault-proneness is above that threshold are classified as estimated faulty and the others as estimated non-faulty. The selection of the threshold value is to some extent subjective and arbitrary, and different threshold values may lead to very different results in terms of classification accuracy. Objective. With our proposal, the accuracy of a fault-proneness model can be evaluated without fixing a threshold. Method. We first derive a property of Binary Logistic Regression fault-proneness estimation models. We show that the number of actually faulty software modules in the training set used to build a model is equal to the number of modules estimated faulty in that set, i.e., estimation is perfect on the training set. Then, we use the model on a different set, the test set, and estimate the number of faulty modules. We also estimate the number of faulty modules in the test set by using a more conventional approach with five different fault-proneness thresholds, and we finally compare the estimates with the estimates obtained via our approach. We carried out the empirical validation on a data set from NASA hosted on the PROMISE repository, by using a technique similar to the one used in K-fold cross validation. Results. In the empirical validation we carried out, the approach we propose is able to estimate the number of faulty modules in the test sets better than the threshold-based ones, in a statistically significant way. Conclusions. Our approach seems to have the potential to be practically used to accurately estimate the number of faulty modules without having to set specific fault-proneness thresholds.

Using Logistic Regression to Estimate the Number of Faulty Software Modules

MORASCA, SANDRO
2014-01-01

Abstract

Background. The evaluation of the accuracy of an estimation model for software fault-proneness is carried out by using the model with data collected on a set of software modules and classifying the modules in the set as either estimated faulty or estimated non-faulty. This classification usually involves setting a fault-proneness threshold: software modules whose fault-proneness is above that threshold are classified as estimated faulty and the others as estimated non-faulty. The selection of the threshold value is to some extent subjective and arbitrary, and different threshold values may lead to very different results in terms of classification accuracy. Objective. With our proposal, the accuracy of a fault-proneness model can be evaluated without fixing a threshold. Method. We first derive a property of Binary Logistic Regression fault-proneness estimation models. We show that the number of actually faulty software modules in the training set used to build a model is equal to the number of modules estimated faulty in that set, i.e., estimation is perfect on the training set. Then, we use the model on a different set, the test set, and estimate the number of faulty modules. We also estimate the number of faulty modules in the test set by using a more conventional approach with five different fault-proneness thresholds, and we finally compare the estimates with the estimates obtained via our approach. We carried out the empirical validation on a data set from NASA hosted on the PROMISE repository, by using a technique similar to the one used in K-fold cross validation. Results. In the empirical validation we carried out, the approach we propose is able to estimate the number of faulty modules in the test sets better than the threshold-based ones, in a statistically significant way. Conclusions. Our approach seems to have the potential to be practically used to accurately estimate the number of faulty modules without having to set specific fault-proneness thresholds.
2014
EASE'14, 18th International Conference on Evaluation and Assessment in Software Engineering
9781450324762
EASE'14, 18th International Conference on Evaluation and Assessment in Software Engineering
Londra
13 - 14 maggio 2014
File in questo prodotto:
File Dimensione Formato  
MorascaEASE2014.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 286.67 kB
Formato Adobe PDF
286.67 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2012121
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact