Diagnostic and Prognostic Research | |
A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve | |
Qian M. Zhou1  Lu Zhe2  Yan Yuan2  Russell J. Brooke3  Melissa M. Hudson3  | |
[1] Department of Mathematics and Statistics, Mississippi State University, Mississippi State, MS, USA;School of Public Health, University of Alberta, Edmonton, AB, Canada;St. Jude Children’s Research Hospital, Memphis, TN, USA; | |
关键词: Prediction performance; AUC; Area under precision-recall curve; Brier score; Proper scoring rules; Rare outcome; | |
DOI : 10.1186/s41512-021-00102-w | |
来源: Springer | |
【 摘 要 】
BackgroundIncremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy.MethodsIn this article, we examine the analytical connections and differences between the AUC IncV (ΔAUC) and AP IncV (ΔAP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (ΔsBrS) in the numerical study.ResultsWe demonstrate that ΔAUC and ΔAP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, ΔAP assigns heavier weights to the changes in higher-risk regions, whereas ΔAUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that ΔAP has a wide range, from negative to positive, but the range of ΔAUC is much smaller. In addition, ΔAP and ΔsBrS are highly consistent, but ΔAUC is negatively correlated with ΔsBrS and ΔAP when the event rate is low.ConclusionsΔAUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, ΔAP accentuates the change in the prediction accuracy for higher-risk regions.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202108127097248ZK.pdf | 1397KB | download |