Background. Practical use of software code measures for faultiness estimation often requires setting thresholds on measures, to separate software modules that are likely to be faulty from modules that are likely to be non-faulty. Several threshold proposals exist in the literature. However, different proposals may recommend different threshold values for the same measure, so practitioners may be unsure which threshold value they should use. Objective. Our goal is to investigate whether it is possible to define a single threshold for a code measure, or at least a small range within which to select a threshold. Method. We carried out an empirical study based on two collections of datasets available in the SEACRAFT repository. For each dataset, we built all statistically significant univariate Binary Logistic Regression fault-proneness models, using a set of the most commonly used code measures as independent variables. We then derived thresholds for single software measures by setting an acceptability threshold to fault-proneness. We then checked whether the distribution of the thresholds obtained for the same measure is concentrated enough that it is possible to take a value of the code measure as a “universal” threshold for it. We repeated the same method with bivariate Binary Logistic Regression fault-proneness models. Results. The threshold distributions we obtained are quite dispersed, so it does not seem to be possible to define “universal” thresholds for the code measures we considered. Conclusions. According to the collected evidence, it appears hardly possible to effectively control software faultiness by recommending that the most commonly used code measures satisfy fixed thresholds.
An Empirical Study of Thresholds for Code Measures
L. Lavazza
;S. Morasca
2020-01-01
Abstract
Background. Practical use of software code measures for faultiness estimation often requires setting thresholds on measures, to separate software modules that are likely to be faulty from modules that are likely to be non-faulty. Several threshold proposals exist in the literature. However, different proposals may recommend different threshold values for the same measure, so practitioners may be unsure which threshold value they should use. Objective. Our goal is to investigate whether it is possible to define a single threshold for a code measure, or at least a small range within which to select a threshold. Method. We carried out an empirical study based on two collections of datasets available in the SEACRAFT repository. For each dataset, we built all statistically significant univariate Binary Logistic Regression fault-proneness models, using a set of the most commonly used code measures as independent variables. We then derived thresholds for single software measures by setting an acceptability threshold to fault-proneness. We then checked whether the distribution of the thresholds obtained for the same measure is concentrated enough that it is possible to take a value of the code measure as a “universal” threshold for it. We repeated the same method with bivariate Binary Logistic Regression fault-proneness models. Results. The threshold distributions we obtained are quite dispersed, so it does not seem to be possible to define “universal” thresholds for the code measures we considered. Conclusions. According to the collected evidence, it appears hardly possible to effectively control software faultiness by recommending that the most commonly used code measures satisfy fixed thresholds.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.