How can we help you today?

What is a confusion matrix?

Modified on: Tue, Jun 18, 2019 at 11:44 AM



WHAT IS THE PURPOSE

  • To compare what Luminoso’s classifiers detected vs. what the true label was (the label on the training data)

  • To be able to see which labels are having overlap, and may be causing ‘confusion’ in the classifier.

  • To fix those areas of confusion, so that the classifier is in a better state to push into production


HOW TO READ A MATRIX

An example of how to read a column in the above screenshot of a confusion matrix:

  • In D2 you see, ‘local host installs’.

  • This D column is to illustrate how the classification for the label ‘local host installs’ performed.

  • You can see that out of 304 testing documents (B8), 268 (B3) were correctly classified.

  • 4 documents were classified with the label of ‘installation’, which differed from the training data tag of ‘local host install”.


WHAT NEEDS CORRECTING

  • Generally, we recommend inspecting and potentially correcting a label if the accuracy is below 80%. 


    • This accuracy is calculated by taking the correct number, and dividing by the total number, for example in the above screenshot (B3/B8=accuracy)

  • Also, if there is an incorrect cell that is taking a large portion of the total incorrect classifications, this should be investigated to see why this large overlap/confusion is existing. A ‘large portion’ is not an exact threshold, but typically 10-20% of the total classifications is a good rule of thumb.



Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.