'Privacy Issues' Interview: Trends and challenges of data anonymization

'Privacy Issues' talked with Celantur’s founders, Alexander Petkov and Boyang Xia about the trends and challenges of data anonymization technology.


30 March 2020, by Alexander PetkovAsk a question


What is the story behind Celantur?

Alex: Celantur results from a scratch-your-own-itch idea. In 2017, during my work as a volunteer firefighter, fire departments regularly took pictures for incident documentation, where faces and license plates needed to be redacted due to privacy concerns. The manual labour was always a hassle, especially after physically and emotionally exhausting incidents.

Later I noticed other use cases. For example, kindergarten and school group photos needed to be redacted to protect the privacy of children. It was time to prototype. In 2019, I quit my job and went all in to build Celantur with the idea of “automated data anonymisation for images and videos.” Together with my business partner Boyang Xia, in late summer of 2019, we built a working minimum viable product (MVP) for a specific use case of our first customer.

Boyang: We started our company without outside funding, but with initial and continuous support of TECHHOUSE, an innovation hub and startup network with AI and Cyber Security focus. This year we were accepted into tech2b, Upper Austria’s technology-oriented incubation program. The program provides us with training, mentoring and an extensive network, which is immensely valuable for us as first-time founders.

What use cases for data anonymisation technology do you see these days?

Alex: Celantur focuses on mobile mapping that involves the capture of large amounts of street-level imagery. Like you might see in Google Street View or similar applications. This data is then used to colourise grey-scale point clouds and 3D models, as well as for road asset management. In this case all images and videos with personal data but without the consent of the data subjects could benefit from anonymisation.

Other use cases are process monitoring to oversee construction sites’ progress and analysing customer patterns in shopping centres to improve sales. Setting up cameras may prompt concerns from labour unions or raise challenges on compliance with data protection laws. Anonymising video feeds could become a useful tool in this regard.

One of the next features to become common, as we see it, will be creating artificial data that is capable of filling the gaps caused by anonymisation. A blur caused by anonymising an image of a real person may be substituted with a digitally simulated one. We believe that a rise in privacy awareness and a better understanding of the need for data protection will cause exponential development of data anonymisation techniques. In turn, the use of this technology will become more frequent and disruptive.

What challenges does data and image anonymisation technology face today?

Alex: Automated data processing vastly reduces the number of people accessing your data and, thus, the risk of a “human-induced” data breach. With an automated and scalable solution for data anonymisation, there’s no reason to rely on manual labour in low-cost countries. Although automation solves problems induced by manual data processing, there are certain challenges that need to be solved.

One of them is “de-anonymisation.” In essence, data anonymisation may be viewed as partial erasure, i.e. the personal identifiers relevant for connecting the data item to a living person are erased. Theoretically, it is possible to reconstruct personal data after anonymisation with additional data. To remove this possibility, additional information needs to be removed from the dataset as well. In case of images (video or static), you might have to blur the whole body, or even the surrounding environment, like a car or a room the body is in, impairing the integrity of the original image.

For example, a person whose face is blurred may be identified by pairing the anonymised data with additional data such as GPS coordinates of the mobile phone or the WIFI access logs. This may be obtained from third parties or purchased on the DarkNet. The currently debated ban on facial recognition does not really address the problem with personal identification or privacy. In visual imagery other information, such as gait, posture or gesture, could be equally revealing.

Boyang: Many companies base their operations on cloud computing due to better parallelisation and handling large amounts of data simultaneously. However, uploading terabytes of data into the cloud may become painful and time-heavy due to bandwidth limitations. Transferring data to third parties may pose a challenging legal task. Thus, on-premise computing with software installed at the customer’s in-house infrastructure will remain key in image and video processing. It also reduces organisational and legal overhead to “merely” software licensing, whereas storing personal data in the cloud requires additional legal hustle such as technical and organisational measures (TOM), records of processing activities and data processing agreements.

As a domain-specialist you don’t want to worry about data protection.

What trends do you see in the market for privacy-enhancing or protecting technology?

Boyang: In the first months of Celantur a customer told us: “We’d prefer to not use your service, but our existing tools do not provide an automated image anonymisation feature, and we have to do it somehow.” We totally understood them.

As a domain-specialist you don’t want to worry about data protection. Specialized tools and services that can get the job done become essential. The number one trend we expect to see in the next few years is a rise in the quantity and quality of data protection services becoming a commodity.

Alex: Another trend is the return to on-premise computing. While contractual agreements permit processing data in the cloud, some companies simply do not trust big cloud providers, it could be quite costly, or there could be some technical limitations to the cloud hosting itself. Also, transferring large amounts of data in a limited amount of time could fail due to bandwidth limitations or a location where an Internet connection is not always available.

We also expect the spread of digital twin technology, i.e. creating digital replicas of real-world objects, to be more affordable and easy to use. These may be complex machinery or whole road systems of cities (for example on Google Street View). Feeding (real-world) data into digital twins can allow companies to run simulations, make predictions and consolidate information for better assessment. Needless to say, there’s a lot of personal data involved.

Where can our readers learn more about image and video anonymisation?

Boyang: Although image and video anonymisation has not made it into pop culture (yet), and movies are not really a thing for now, we can recommend a few good reads that continue to influence Alex and myself in our work:

  • Deep Learning by John D. Kelleher is an introduction to artificial intelligence technology;
  • Prediction Machines by Ajay Agrawal, Joshua Gans and Avi Goldfarb gives a great overview of the economics behind artificial intelligence; and,
  • The public paper by the German Federal Commissioner for Data Protection and Freedom of Information provides a solid foundation for understanding the legal framework for data anonymisation in the European Union.

🙏 A big Thank You to the Privacy Issues team!

Privacy Issues is a free bi-monthly newsletter with the most relevant and up to date content covering how privacy legislation and ethics affect technology and innovation. Make sure to 📝 subscribe to their mailing list.

Originally published as Privacy Issues Newsletter Special Edition on March 25, 2020.

Ask us Anything. We'll get back to you shortly

gdprdata protectionenglish
Start Demo Contact Us

Latest Blog Posts

Using object tracking to combat flickering detections in videos

How to decrease the amount of flickering detections in videos with object tracking.


How to copy XMP metadata between JPEG images (again)

Copying XMP metadata between images isn't straightforward. Read how it's done correctly.


20x Faster Than NumPy: Mean & Std for uint8 Arrays

How to calculate mean and standard deviation 20 times faster than NumPy for uint8 arrays.