Face and license plate anonymization: all you need to know

With the increasing demand for privacy and security, it's more important than ever to comply with data protection laws and protect sensitive information from potential breaches and unauthorized access.

Thus, you concluded that your company should have an anonymization tool. Choosing the right solution for image and video anonymization can be a daunting task. In this article, we'll explore the key factors to consider when selecting a solution for this purpose.

What does Image and Video Anonymization mean?

Anonymization is the process of removing personal from images and videos, to protect people's privacy.

For data to be truly anonymized, the anonymization must be irreversible. This means that it should not be possible to retrieve the original data.

Figure 1: Anonymized mobile mapping dataset from Budapest, Hungary. ©Shanghai Huace Navigation Technology Ltd.

Why does Image & Video Anonymization make sense?

Anonymized data is not affected by the major data protection laws (e.g. according to Recital 26, GDPR, or Cal. Civ. Code § 1798.140(h), CCPA). Therefore, anonymization preserves privacy while drastically reducing operational costs resulting from the required paperwork for authorization to use such personal data or data transfer limitations.

Also, data breaches (from the outside) and leaks (from the inside) are constant threats. As proved by previous cases, even the highest cybersecurity measures might not prevent data leaks, with disastrous consequences when personal data is involved. While anonymization cannot discourage such events, you can reduce the risk of exposing sensitive data and prevent possible notification costs for such breaches.

An Overview of Manual, In-house, and Third-party Software Solutions

There are different possibilities to perform image and video anonymization:

Manual
In-house semi-automated/automated solution
Third-party software solution

Manual

Manual anonymization means that people manually check whether personal data (e.g. people, vehicles, etc.) are present. By using an image manipulation tool (e.g. Photoshop), they’ll apply a filter on these objects.

Such a solution is very time-consuming (especially with large amounts of data or multiple objects). Consequently, it can become costly - for instance, the average hourly rate in e.g. Austria is more than EUR 36.7 per hour.

Figure 2: Estimated hourly labor costs in the EU (2020). Source: Eurostat

To avoid such high labor costs, companies could outsource this process to lower-wage countries. However, the GDPR enforces strict rules when it comes to data transfers outside the EU. In general, a transfer is only possible in countries where an ‘Adequacy Decision’ was granted (Andorra, Argentina, Canada, Faroe Islands, Guernsey, Israel, Isle of Man, Japan, Jersey, New Zealand, Republic of Korea, Switzerland, the United Kingdom, and Uruguay).

In the absence of an Adequacy Decision, a series of legal, contractual, and security limitations apply. In addition, it could expose a company to higher risks of data breaches during these transfers (via the cloud or physical storage).

Furthermore, manual labor is not scalable for large volumes of data. For example, we estimate that a person could anonymize ca. 600-700 images per hour. For a dataset of 1 million images, it would take between 178 and 208 days (8 hours/working day). In comparison, a cloud-based solution would anonymize between 5,000-10,000 images per hour at the same costs as an outsourcing company in lower-income countries.

In-house semi-automated/automated solution

To build an anonymization solution that meets regulatory requirements, you will need to utilize human resources from Software Development (Productive hourly rate - EUR 55) and Machine Learning (Productive hourly rate - EUR 70).

Once that is established, the following tasks need to be considered and done:

Data strategy, (data acquisition, labeling, etc)
Machine learning (training, testing, model optimization, etc)
Requirement engineering
Software development (backend, user interface)
Testing
Bug fixing
Documentation
Maintenance and Support

Figure 3: Cost table for developing a solution in-house. ©Celantur GmbH

With that being said, we estimate that it can cost ca. EUR 66,000 worth of software developers and ML engineers work. Even then, there is no guarantee of achieving acceptable quality, speed, and computation efficiency, as well as the need to invest resources in maintaining and improving the model.

For these reasons, the cost of developing such a solution in-house is not worth the effort, as it requires financial and human resources and therefore distracts the company from its core business.

In-house semi-automated/automated solution

There are several reasons why a company may prefer third-party software for image and video anonymization instead of relying on manual or in-house solutions.

First, third-party software is specifically designed to detect and anonymize personal information in images and videos, whereas manual methods can be prone to human error. This makes them more accurate and efficient.

Second, it can provide better encryption and data protection features. Moreover, external software is more likely to be updated regularly to stay current with new privacy regulations and security threats.

Finally, third-party software can be more cost-effective than building and maintaining in-house solutions. In fact, third-party software providers typically offer pay-per-use or subscription-based pricing models, which can be more budget-friendly than the costs associated with developing and maintaining in-house solutions.

In conclusion, the benefits of third-party software are:

Improved accuracy not susceptible to human error
Better security and privacy features
Cost-effectiveness compared to in-house solutions

What to consider for your Image & Video Anonymization Software

Quality of anonymization

The main aspect of any anonymization software is its ability to accurately detect and blur personal information. Look for a solution that uses state-of-the-art machine learning models to anonymize faces and license plates. These algorithms are designed to detect and recognize patterns and features in images and videos, making them highly effective even in complex and challenging scenarios.

Two important terminologies to evaluate the accuracy of these algorithms:

true positive, an object is correctly detected
false negative, an object is not detected
and false positive, the background or an irrelevant object is wrongly marked as a relevant object.

In most cases, a false negative is more severe than a false positive. Not detecting an identifiable person poses a larger problem than, say, mistaking a construction container for a car. However, mistaking a traffic sign for a number plate can be equally problematic.

Volume of data

If you need to anonymize a large volume of data or objects (e.g. crowded areas with lots of people) look for a solution that offers batch-processing capabilities to optimize the processing time.

Batch processing is a method of processing a large amount of data by dividing it into smaller chunks or batches. This allows the data to be processed in smaller pieces and saves you time and effort.

Data protection and security due diligence

If you or your customer is an EU-based company, up-to-date documentation is mandatory according to the GDPR, such as:

Technical and organizational measures (TOM)
Records of Processing Activities
Dedicated Data Protection Officer (DPO)
GDPR-compliant data centers
Data encryption

Make sure that the third-party provider you work with has all the necessary legal documentation, as well as review such documentation with your Data Protection Officer - a role that every company and organization must appoint to ensure the lawful processing of personal data (art. 37, GDPR).

Customization Options

Every use case is unique, so it's important to choose a solution that offers a wide range of customization options. This can include different blurring and pixelation techniques, as well as the ability to add custom masks and annotations.

Customization options allow you to tailor the anonymization process to your specific needs and use case, which is especially important if you're dealing with sensitive and confidential information.

Documentation and Support

Finally, make sure to choose a solution that offers excellent customer support and comprehensive documentation. This will ensure that you have the resources you need to effectively

Storage location and workflow integration

Ultimately, you should consider how and where you want to use or deploy your anonymization solution. Based on our experience, these are the most common deployment methods:

Cloud-based service
On-premise
Edge

Cloud-based service

A cloud-based software runs on a cloud infrastructure (e.g. AWS, GCP, Azure, etc.) and the anonymization algorithm is accessed online via a browser, rather than bought and installed on an individual computer or own infrastructure.

This allows for easy access to the data from anywhere with an internet connection, scalability as the data grows, and low entry barriers given by technical limitations. On the other hand, bottlenecks might be represented by the internet throughput for the upload/download of large volumes of data.

Figure 4: Cloud-based anonymization software. ©Celantur GmbH

On-premise

On-premise refers to storing the data on local machines, physical servers, or your own cloud infrastructure. This is often considered the preferred option for enterprises because it solves several internal security issues. However, it also requires a significant investment in hardware and maintenance.

In this context, an executable software or container (e.g. Docker) can be installed and run with no internet connection, overcoming potential concerns arising from data transfer to third parties or internet throughput limitations.

Edge

Lastly, edge refers to storing the data directly on the device, such as a camera or sensor. This can be useful for situations where the data needs to be processed in real time, or where internet connectivity is not available.

From a processing and security standpoint, edge anonymization might be the ideal option, as it minimizes the movement of data from and to different locations.

However, this also means that the data cannot be easily accessed from other locations, might not work with different camera models, and have storage limitations.

In summary:

Cloud is good for easy access, scalability, and cost savings
On-premise storage is good for security, control, and performance but with high cost
Edge is good for real-time processing and offline access but with storage limitations.

How Celantur can help you

Choosing the right anonymization option for your business can be challenging, but Celantur is here to assist.

Our fully-automated solution for anonymizing images and videos uses industry-grade technology to blur faces, license plates, persons, and vehicles with a detection rate of up to 99%.

We offer two software solutions:

Celantur Cloud: user-friendly and pay-per-use option with fast data processing capabilities. Available as a cloud-based SaaS or Cloud API.
Celantur Container: Highly scalable Docker container that can be deployed on your local machine, physical servers, or public/private cloud infrastructure. Seamless integration into your data workflows via input and output directories, NumPy array via a TCP socket connection, or RestAPI.

Using our internal machine learning know-how for object detection and image segmentation, we can deliver new models faster or solution deployment on edge.

To facilitate the legal basis for processing image and video data, we have strong measures in place to comply with the GDPR and other data protection laws. Take a look at all our data protection measurements here.

Face and license plate anonymization: all you need to know

What does Image and Video Anonymization mean?

Why does Image & Video Anonymization make sense?