Checklist: Video Anonymization for ADAS Datasets

All you need to know to anonymize autonomous vehicle datasets in compliance with the GDPR and data protection laws

06 February 2023, by Mario Sabatino RiontinoAsk a question

Figure 1: Photo by Chad Gray on Unsplash
Figure 1: Photo by Chad Gray on Unsplash


Camera-only and multiple sensors approaches are two different methods used to achieve autonomous driving.

A camera-only approach relies solely on cameras to capture visual data and make driving decisions. Cameras are used to detect and identify objects, such as other vehicles, pedestrians, and traffic signals. This approach is relatively cost-effective and can be integrated into existing vehicles with minimal modification. However, it may be less reliable in certain weather conditions, such as heavy rain or snow, which can affect visibility. Also, cameras can have difficulty in differentiating between objects with similar characteristics, such as a plastic bag and a piece of paper.

On the other hand, a multiple sensors approach uses a combination of sensors, such as cameras, lidar, and radar, to gather information about the vehicle's environment. Lidar uses laser beams to create a 3D map of the surrounding area and can detect objects at a greater distance and with more accuracy than cameras alone. Radar uses radio waves to detect objects and can also provide information on the object's speed and direction of movement. This approach can provide a more robust and reliable system as it can handle a wider range of weather and lighting conditions, and can differentiate between similar objects with more ease. However, it is also more complex and costly to implement.

In this article, we will not focus on analyzing in detail the advantages and disadvantages of both approaches. Rather, we will focus on video data (which form the basis of both) and the threats to individuals' privacy arising from the accumulation of large amounts of video datasets.

As we explained in this previous article, the general public is concerned about the privacy impact of the widespread use of autonomous vehicles. Thus, a variety of measures focused on protecting personal information should be applied to increase the general acceptance of autonomous vehicles.

Why anonymization for ADAS and Autonomous Vehicles

First of all, we need to differentiate between two types of personal data. The first type is primary data — which is recorded and collected when using the car and is held by the owner of the vehicle. For example, data like the kind of music you’re listening to, so you can receive personalized recommendations like what you get on Spotify.

The second type is secondary data — which is collected “indirectly”, like a pedestrian walking or a cyclist riding.

While consent for the first type of data is usually covered by contract clauses, the use of secondary data is regulated by the GDPR regulatory framework.

Art. 7 of the GDPR states that written consent by the data subject should be proved to process its data. Also, the data subject shall have the right to withdraw his or her consent at any time.

Anyway, requesting data from hundreds of thousands of pedestrians could be cumbersome, time-consuming, and costly.

For more information about the legal basis for ADAS/AV data collection, you read our interview with Mag. Philipp Summereder.

Fortunately, the GDPR proposes an alternative to data consent: anonymization.

As specified in Recital 26 of the GDPR, “The principles of data protection should therefore not apply to anonymous information [...] data rendered anonymous in such a way that the data subject is not or no longer identifiable”.

What has to be anonymized

For many years, the automotive community has stated that anonymization is needed to ensure the privacy of the participants (e.g. when publishing results or to enable data sharing). This is true, but incredibly difficult to achieve as long as the original dataset is still accessible, thus reversible and re-identifiable.

ADAS/AV data can be classified under three main categories:

  • Owner and passenger information like comfort, driving, and entertainment settings.
  • Location data such as vehicle GPS vehicle location, speed, real-time traffic, etc.
  • Sensor data, including cameras or dash cams - front, rear, and side cameras - radar, thermal imaging devices, and light detection and ranging (LiDAR) devices.

Owner and passenger information do not constitute a regulatory issue (since compliance is covered under terms and conditions between the user and the company) unless shared with third parties (e.g. data processors).

Location data are still an “unsolved problem”. Researchers have proved that only a few location points are enough to re-identify an individual with 95% accuracy. Also, current anonymization techniques weren’t effective against re-identification.

Differential privacy provides a promising privacy definition for location data, but research is still premature for an application at scale.

Lastly, sensor data (in particular imagery) has gained great attention from companies and regulators. Anonymization techniques such as blurring have gained large adoption due to their technological maturity and effectiveness to protect personal data.

How to anonymize personal data from AV data

For most of the use cases, blurring offers the best trade-off between performance, anonymization, and reduced distortion, emerging de facto as a standard anonymization method. In fact, companies like Google, Microsoft, and TomTom are using it to protect personal data.

Currently, there are several approaches to video anonymization. Let’s break down each solution with advantages and disadvantages:

For most of the use-cases, blurring offers the best trade-off between performance, anonymization and reduced distortion, emerging de-facto as a standard anonymization method. In fact, companies like Google, Microsoft and TomTom are using it to protect personal data.

Currently, there are several approaches to video anonymization. Let’s break down each solution with advantages and disadvantages:

Current solution Advantages Disdvantages
In-house manual anonymization The company has full-control of the data. Time-consuming and consequently costly due to high hourly rate.
Outsourced in low-wage countries. Price-per-hour for manual redaction of faces and license plates is significantly cheaper. Transferring EU data to countries outside the EU is strictly regulated and complex.
In-house AI solutions. The company has full control of the data. The process is partially or fully automated. Require ML and software engineering resources to develop and maintain the models. No guarantee of achieving acceptable quality, speed, and computation efficiency.

We have written a special article on the implications of these solutions in detail. You can find it here.

Celantur Approach

Celantur offers an enterprise-ready and scalable solution for anonymizing images and videos uses industry-grade technology to blur faces, license plates, persons, and vehicles with a detection rate of up to 99%.

We offer two software solutions:

  • Celantur Cloud: user-friendly and pay-per-use option with fast data processing capabilities. Available as a cloud-based SaaS or Cloud API.
  • Celantur Container: Highly scalable Docker container that can be deployed on your local machine, physical servers, or public/private cloud infrastructure. Seamless integration into your data workflows via input and output directories, input and output directories, or RestAPI.

Using our internal machine learning know-how for object detection and image segmentation, we can deliver new models faster or solution deployment on edge.

To facilitate the legal basis for processing image and video data, we have strong measures in place to comply with the GDPR and other data protection laws. Take a look at all our data protection measurements here.

Ask us Anything. We'll get back to you shortly

automotivedata protectiongdprenglish
Start Demo Contact Us

Latest Blog Posts

Using object tracking to combat flickering detections in videos

How to decrease the amount of flickering detections in videos with object tracking.

How to copy XMP metadata between JPEG images (again)

Copying XMP metadata between images isn't straightforward. Read how it's done correctly.

20x Faster Than NumPy: Mean & Std for uint8 Arrays

How to calculate mean and standard deviation 20 times faster than NumPy for uint8 arrays.