Summary
In some metrics like ROC-AUC or PR-AUC wouldn't it be more accurate to concatenate all the validation sets from all the folds and then calculating the metric globally, instead of averaging results from each fold? This method does not allow to report standard deviation.