Understanding Specificity and Sensitivity

How these two kind of wrong can impact your computer vision project for object detection.


03 February 2021, by Boyang XiaAsk a question


Introduction

2020 was a tough year. Alex and I founded Celantur last March, ten days before Austria entered its first lockdown. Mario boldly joined us in May when the pandemic was in full swing. Thanks to the digital nature of our service and our remote-first culture, we are weathering the crisis comparably well.

Our core business is the protection of everyone’s privacy, and yet there is a crucial aspect to it that we share with the fight against COVID-19.

Test negative or test positive?

In the ideal world, an observation or a test will always tell you the truth, i.e. your COVID-19 test is positive if you suffer from the disease, and negative if you are not.

Reality deviates significantly from its ideal version. Sometimes you suffer from COVID-19 even if your test does not detect it, and sometimes you don’t have it even if your test shows the opposite. And sometimes, as Elon Musk will tell you, these mistakes happen too often:

elon musk twitter covid

False positives and false negatives

Let’s visualise the problem using a so-called confusion matrix:

Confusion matrix

Observation

Positive Negative

Reality

Positive 😀 True positive (TP) 😡 False negative (FN)
Negative 😡 False positive (FP) 😀 True negative (TN)

If you observe something correctly, then it’s either a true positive, i.e. a correct positive test, or a true negative, i.e. a correct negative test.

If you observe something wrongly, then it’s either a false positive, e.g. a positive test despite the patient being coronavirus-free, or a false negative, e.g. a negative test despite the patient being infected.

Ideally, you minimise both forms of wrong observations, but in most cases, you have to make the cruel trade-off between fewer false negatives or fewer false positives.

Confusion matrix applied to real life

Let’s suppose a population of 1000 people, of which 200 are infected with COVID-19. In the first scenario, we have a very sensitive test which detects all the COVID-19 cases. Unfortunately, it also misidentifies 300 healthy patients as infected:

1st scenario

Observation

Positive Negative

Reality

Positive TP = 200 FN = 0
Negative FP = 300 TN = 500

In the second scenario, we have a very specific test which doesn’t produce the above error, but it yields a positive detection only for 50 cases with the highest virus count, ignoring the remaining 150 infections:

2nd scenario

Observation

Positive Negative

Reality

Positive TP = 50 FN = 150
Negative FP = 0 TN = 800

Mathematically speaking, sensitivity is the number of true positives divided by the sum of true positives and false negatives: eq sensitivity

And specificity is the number of true negatives divided by the sum of true negatives and false positives: eq specificity

In the first scenario, the sensitivity is 100% and specificity is only 62.5%, whereas in the second scenario the sensitivity is only 25%, but specificity is 100%.

Sensitvity Specificity
1st scenario s1 sensitivity s1 specificity
2nd scenario s2 sensitivity s2 specificity

Summing it up, maximising sensitivity means reducing false negatives, and maximising specificity the reduction of false positives.

sensitivity ↑ = ↓ false negatives specificity ↑ = ↓ false positives

Different situations prioritise sensitivity and specificity differently.

For example, if donated blood is tested for sexually transmitted diseases, tests should have high sensitivity. Even though some blood samples are wrongly tested positive, it is better to prevent a patient from receiving tainted blood.

In the case of COVID-19, a high number of false-positives would severely disrupt the social and economic lives of many people by unnecessarily quarantining them. Thus, very specific tests that reliably detect infectious cases are preferable.

Celantur: specific and sensitive

To protect your privacy, we remove personal data (e.g. faces and license plates) from images and videos, for which we first need to detect them. And here, we encounter the same fundamental problem as a medical testing facility: False negatives, e.g. missing a face, and false positives, e.g. mis-detecting a street sign as a license plate.

Figure: Section of the painting “The School of Athens” by Raphael.
A face detector could correctly identify a face (green TP), misidentify something as a face (false FP), miss a face (red FN), or correctly ignore everything that is not a face, i.e rest of the image as TN.
Figure: Section of the painting “The School of Athens” by Raphael. A face detector could correctly identify a face (green TP), misidentify something as a face (false FP), miss a face (red FN), or correctly ignore everything that is not a face, i.e rest of the image as TN.

Analogously, different applications demand different levels of sensitivity and specificity. Mapping a public square with many unaware passers-by requires the anonymisation to be very sensitive, even if it entails some false positives. On the other hand, mapping an industrial plant requires the anonymisation to be very specific to avoid machines and equipment being mistakenly blurred.

Summary

  • Understanding the trade-off between specificity and sensitivity is crucial to many fields as diverse as epidemiology and data privacy.
  • You have to consider the trade-offs invidually for each application, eg. public square vs. industrial plant.
  • Confusion matrix could be applied to help you decide the right trade-off for your project.

About Celantur

At Celantur, we use several distinct machine learning models and sets of parameterisations, in order to maximise sensitivity/specificity for the individual use-case our customers encounter.

✅ We anonymize all kinds of RGB-imagery: planar, panorama images and videos.

✅ Our cloud platform is capable of anonymizing around 200.000 panoramas and 24 hours of videos per day.

✅ Industry-grade anonymization quality: detection rate up to 99%.

computer visiondata protectionenglish
Start Demo Contact Us

Latest Blog Posts

20x Faster Than NumPy: Mean & Std for uint8 Arrays

How to calculate mean and standard deviation 20 times faster than NumPy for uint8 arrays.


Celantur and Virtual Vehicle Collaborate for Privacy Preserving Driving Technology

Enabling automotive companies to develop AD/ADAS systems while respecting privacy.


How to Redact Faces on Images and Videos

Learn about the process of image and video redaction for data privacy compliance with insights on methods like blurring, pixelation, and more. Discover the advantages of AI-based solutions and Celantur's industry-leading technology.