Let us consider the cancer data sample again
true value is positive | true value is negative | |
---|---|---|
predicted positive | true positive TP | false positive FP |
predicted negative | false negative FN | True negative TN |
$\mbox{true positive rate}=\frac{TP}{TP+FN}$
and
$\mbox{false positive rate}=\frac{FP}{FP+TN}$
we can see how well the prediction works by plotting the true value as a function of $z$ for each data point in the training sample:
The different categories (TP, FP, TN, FN) can be visualised on this plot:
If we are more worried about false negative than about false positive, we can move the decision boundary to the left:
Of course if means more false positives...
If we are more worried about false positive than about false negative, we can move the decision boundary to the right:
Of course if means more false negatives...
The curve describing this trade-off is the ROC curve (Receiver Operating Characteristic). It is the collection of (FP rate, TP rate) values for all values of the decision boundary.
Move the threshold to the left:
Move the threshold to the right: