Metrics

We need metrics to meatsure model performance. Accuracy does not perform well with imbalanced data sets, typically in predicting students at risk. For example, if you have 95 negative and 5 positive examples, classifying all as negative gives 0.95 accuracy score. Balanced Accuracy (bACC) overcomes this problem, by normalizing true positive and true negative predictions by the number of positive and negative samples, respectively, and divides their sum into two

Whether Precision or Recall is more important depends on the cost. The cost for flagging a not-at-risk student to be at-risk (False Positive) may be low, while the cost for flagging an at-risk student to be at-risk (False Negative) is high as the student may fail and drop out. For this example, Recall is more important that Precision.

A measure that combines precision and recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score: