How to copy XMP metadata between Images
Copying XMP metadata between images isn't straightforward. Read how it's done correctly.
XMP (Extensible Metadata Platform) is a metadata format introduced by Adobe Systems Inc. Sometimes it is necessary that we alter an image (resize or change pixels), but want to keep all the metadata. It turns out that this isn't that straightforward when dealing with XMP metadata. There exist generally two ways to copy XMP data in Python using a dedicated framekwork. These are:
- py3exiv2, which is a Python binding to Exiv2 which is a tool for managing image data, and
- Pythom XMP Toolkit thats works with XMP image data.
The main problem with those frameworks is that after copying the XMP data to another image, the size of the XMP metadata changes.
For the following experiments we will use this test image, which we will refer as
XMP hidden in the JPEG
Let us look more closely at this using ImageMagick to check the size:
> identify -verbose original.jpg Profile-xmp: 28401 bytes
Another way to check is with the ExifTool by Phil Harvey:
> exiftool -v original.jpg JPEG APP1 (28430 bytes): + [XMP directory, 28401 bytes]
The destination image that will receive the XMP content will be created on the fly. With the Python XMP Toolkit we can copy the data the following way.
import PIL from PIL import Image from libxmp import XMPFiles, consts from libxmp.utils import file_to_dict source = 'original.jpg' dest = 'new.jpg' new_image = PIL.Image.new(mode="RGB", size=(200, 200)) new_image.save(dest) xmpfile = XMPFiles(file_path = source, open_forupdate = True) xmpfile2 = XMPFiles(file_path = dest, open_forupdate = True) xmp = xmpfile.get_xmp() xmpfile2.put_xmp(xmp) xmpfile2.can_put_xmp(xmp) xmpfile.close_file() xmpfile2.close_file()
The other method involves using the py3exiv2 library:
import pyexiv2 from PIL import Image import PIL new_image = PIL.Image.new(mode="RGB", size=(200, 200)) new_image.save("new.jpg") metadata_1 = pyexiv2.ImageMetadata('original.jpg') metadata_1.read() metadata_1.modified = True metadata_2 = pyexiv2.metadata.ImageMetadata('new.jpg') metadata_2.read() metadata_1.copy(metadata_2, xmp = True) metadata_2.write()
Let us now check the size of the XMP section in the image we just created.
> identify -verbose new.jpg Profile-xmp: 19658 bytes
Let's check with exiftool just to be sure that it is not the fault of ImageMagic not reading meta data correctly.
> exiftool -v new.jpg JPEG APP1 (19687 bytes): + [XMP directory, 19658 bytes]
As you see there is discrepancy of around 9000 bytes, caused by a different XMPToolkit Tag and some reformatting of the bytestring. There also might be some custom XMP data that are not be copied. Thus to be completly sure that no information is lost, changed or reformatted we can directly copy the whole XMP part of the bytestring to the new image.
How XMP starts...
We run the following piece of code to see how the bytestring looks like.
filename = 'original.jpg' with open(filename, 'rb') as file: contents = file.read()
The output looks quite overwhelming. According to the XMP specification the following line indicates the beginning of the XMP specification.
\xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00<?xpacket begin=\'\xef\xbb\xbf\' id=\'W5M0MpCehiHzreSzNTczkc9d\'?>\n<x:xmpmeta xmlns:x=\'adobe:ns:meta/\' x:xmptk=\'Image::ExifTool 10.96\'>
\xff\xe1 indicates the value of the field name
APP1 which has the size of 2 bytes. Then we have 2 bytes that indicate the size of the XMP packet and 29 bytes for the namespace. The rest of the bytestring is the XMP packet itself. In our image the size is
o\x10 (encoded as ASCII - thus
6F10 in hexadecimal format) which corresponds to 28432 bytes. Thus we have 31 bytes at the start of the XMP section for the namespace (29 bytes) and the representation of the size itself (2 bytes). The exiftool framework above shows us a size of 28430 bytes, which is 2 bytes off, probably due to not counting the 2 bytes that represent the total size of the section.
... and how it ends
The end of the XMP packet is basically where the next section starts. This is again indicated by a marker. According to the exiv2 documentation, the marker is
\xff\xdb which describes the DQT (Define Quantization Table) section.
Now we just need to copy the data between the start marker
\xff\xe1o\x10http://ns.adobe.com/xap/1.0/\x00 and the end marker
\xff\xdb, ending up with the following script:
def add_xmp(source: str, dest: str): with open(source, 'r+b') as file_1: o_img = file_1.read() xmp_start = o_img.find(b'http://ns.adobe.com/xap/1.0/\0') xmp_end = o_img.find(b'\xff\xdb', xmp_start) if xmp_start == -1: return xmp_str = o_img[xmp_start - 4: xmp_end] with open(dest, 'r+b') as file_2: d_img = file_2.read() xmp_end = d_img.find(b'\xff\xdb') first_part = d_img[:xmp_end] second_part = d_img[xmp_end:] new_str = first_part + xmp_str + second_part file_2.seek(0) file_2.truncate() file_2.write(new_str) file_2.close()
Now the XMP size should be the same in the new image. I hope that was helpful and clarified the process of copying XMP data.