How to copy XMP metadata between Images

Copying XMP metadata between images isn't straightforward. Read how it's done correctly.


27 October 2021, by Cezary ZaboklickiAsk a question


Introduction

This article is obsolete. Please read the new article on XMP

XMP (Extensible Metadata Platform) is a metadata format introduced by Adobe Systems Inc. Sometimes it is necessary that we alter an image (resize or change pixels), but want to keep all the metadata. It turns out that this isn't that straightforward when dealing with XMP metadata. There exist generally two ways to copy XMP data in Python using a dedicated framekwork. These are:

  • py3exiv2, which is a Python binding to Exiv2 which is a tool for managing image data, and
  • Pythom XMP Toolkit thats works with XMP image data.

The main problem with those frameworks is that after copying the XMP data to another image, the size of the XMP metadata changes.

For the following experiments we will use this test image, which we will refer as original.jpg.

XMP hidden in the JPEG

Let us look more closely at this using ImageMagick to check the size:

> identify -verbose original.jpg
Profile-xmp: 28401 bytes

Another way to check is with the ExifTool by Phil Harvey:

> exiftool -v original.jpg
JPEG APP1 (28430 bytes):
    + [XMP directory, 28401 bytes]

Vanilla copying

The destination image that will receive the XMP content will be created on the fly. With the Python XMP Toolkit we can copy the data the following way.

import PIL
from PIL import Image

from libxmp import XMPFiles, consts
from libxmp.utils import file_to_dict

source = 'original.jpg'
dest = 'new.jpg'

new_image = PIL.Image.new(mode="RGB", size=(200, 200))
new_image.save(dest)

xmpfile = XMPFiles(file_path = source, open_forupdate = True)
xmpfile2 = XMPFiles(file_path = dest, open_forupdate = True)

xmp = xmpfile.get_xmp()
xmpfile2.put_xmp(xmp)
xmpfile2.can_put_xmp(xmp)

xmpfile.close_file()
xmpfile2.close_file()

The other method involves using the py3exiv2 library:

import pyexiv2
from PIL import Image
import PIL

new_image = PIL.Image.new(mode="RGB", size=(200, 200))
new_image.save("new.jpg")

metadata_1 = pyexiv2.ImageMetadata('original.jpg')
metadata_1.read()
metadata_1.modified = True

metadata_2 = pyexiv2.metadata.ImageMetadata('new.jpg')
metadata_2.read()

metadata_1.copy(metadata_2, xmp = True)

metadata_2.write()

Let us now check the size of the XMP section in the image we just created.

> identify -verbose new.jpg
Profile-xmp: 19658 bytes

Let's check with exiftool just to be sure that it is not the fault of ImageMagic not reading meta data correctly.

> exiftool -v new.jpg
JPEG APP1 (19687 bytes):
    + [XMP directory, 19658 bytes]

As you see there is discrepancy of around 9000 bytes, caused by a different XMPToolkit Tag and some reformatting of the bytestring. There also might be some custom XMP data that are not be copied. Thus to be completly sure that no information is lost, changed or reformatted we can directly copy the whole XMP part of the bytestring to the new image.

How XMP starts...

We run the following piece of code to see how the bytestring looks like.

filename = 'original.jpg'

with open(filename, 'rb') as file:
    contents = file.read()

The output looks quite overwhelming. According to the XMP specification the following line indicates the beginning of the XMP specification.

\xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00<?xpacket begin=\'\xef\xbb\xbf\' id=\'W5M0MpCehiHzreSzNTczkc9d\'?>\n<x:xmpmeta xmlns:x=\'adobe:ns:meta/\' x:xmptk=\'Image::ExifTool 10.96\'>

\xff\xe1 indicates the value of the field name APP1 which has the size of 2 bytes. Then we have 2 bytes that indicate the size of the XMP packet and 29 bytes for the namespace. The rest of the bytestring is the XMP packet itself. In our image the size is o\x10 (encoded as ASCII - thus 6F10 in hexadecimal format) which corresponds to 28432 bytes. Thus we have 31 bytes at the start of the XMP section for the namespace (29 bytes) and the representation of the size itself (2 bytes). The exiftool framework above shows us a size of 28430 bytes, which is 2 bytes off, probably due to not counting the 2 bytes that represent the total size of the section.

... and how it ends

The end of the XMP packet is basically where the next section starts. This is again indicated by a marker. According to the exiv2 documentation, the marker is \xff\xdb which describes the DQT (Define Quantization Table) section.

Working script

Now we just need to copy the data between the start marker \xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00 and the end marker \xff\xdb, ending up with the following script:

def add_xmp(source: str, dest: str):

    with open(source, 'r+b') as file_1:
        o_img = file_1.read()

    xmp_start = o_img.find(b'http://ns.adobe.com/xap/1.0/\0')
    xmp_end = o_img.find(b'\xff\xdb', xmp_start)

    if xmp_start == -1:
        return

    xmp_str = o_img[xmp_start - 4: xmp_end]

    with open(dest, 'r+b') as file_2:
        d_img = file_2.read()

        xmp_end = d_img.find(b'\xff\xdb')

        first_part = d_img[:xmp_end]
        second_part = d_img[xmp_end:]

        new_str = first_part + xmp_str + second_part

        file_2.seek(0)
        file_2.truncate()
        file_2.write(new_str)
        file_2.close()

Now the XMP size should be the same in the new image. I hope that was helpful and clarified the process of copying XMP data.

computer visionenglish
Start Demo Contact Us

Latest Blog Posts

Using object tracking to combat flickering detections in videos

How to decrease the amount of flickering detections in videos with object tracking.


How to copy XMP metadata between JPEG images (again)

Copying XMP metadata between images isn't straightforward. Read how it's done correctly.


20x Faster Than NumPy: Mean & Std for uint8 Arrays

How to calculate mean and standard deviation 20 times faster than NumPy for uint8 arrays.