Segmentation Masks and Metadata

Celantur container can generate two different segmentation masks and metadata per processed image:

  • Binary Segmentation

  • Instance Segmentation

It's activated with the --save-mask {all, instance, binary} parameter. The segmentation is saved as a PNG file.

Binary Segmentation

The binary segmentation mask consist of two colors:

  • Background is black

  • Anonymized segments are white

The file will be saved as image-name_bin_mask.png.

Instance Segmentation

The instance segmentation mask consists of multiple colors. The RGB color values are used to differentiate to individual instances/objects.

  • The R (red) channel encodes the object type:

    • License plate: 64

    • Person: 128

    • Face: 192

    • Vehicle: 255

  • The G (green) channel encodes individual instances/objects.

  • The B (blue) channel is 0.

E.g. [192, 85, 0] is a face.

The file will be saved as image-name_ins_maks.png.

Scale Down Mask Files

By adding the optional --mask-scale {0-100} (CLI) or /v1/file/1/instance-mask?mask-scale={0-100} (Container API) parameter, mask files will be scaled down by the specified ratio.

Metadata

Image metadata

Metadata about detected instances/objects are stored in the corresponding image-name.json file.

Image metadata example

{
    "id": "image-name.jpg",
    "detections": [
    {
        "id": 0,
        "parent_image": "image-name.jpg",
        "offset": [
            1560,
            744
        ],
        "bbox": [
            1957,
            744,
            3003,
            1855
        ],
        "type": 103,
        "score": 0.9993754029273987,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
    },
    "size": [
        3456,
        5184
    ],
    "duration": 1.8533296539999355,
    "filename": "image-name.jpg",
    "folder": null
}

Video metadata

Metadata is generated for individual frames and provided as a range of mulitple frames.

  • Batch/Stream mode: The information is stored in filename-[startframe]-[endframe].json in the output directory. The maximum number of frames covered by a single file is 500.

Video metadata example

[
  {
    "id": 0,
    "detections": [
      {
        "id": 0,
        "parent_image": 0,
        "offset": [
          3,
          4
        ],
        "bbox": [
          3,
          4,
          201,
          171
        ],
        "type": 103,
        "score": 0.8267934918403625,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
      }
    ],
    "size": [
      178,
      320
    ],
    "duration": 0.35388135999892256
  },
  {
    "id": 1,
    "detections": [
      {
        "id": 0,
        "parent_image": 1,
        "offset": [
          0,
          6
        ],
        "bbox": [
          0,
          20,
          137,
          175
        ],
        "type": 103,
        "score": 0.8945223689079285,
        "is_anonymised": true,
        "type_label": "face",
        "color": null
      }
    ],
    "size": [
      178,
      320
    ],
    "duration": 0.2979645720006374
  },
  ...
]

Metadata attribute reference

Attributes of a image or video frame

AttributeDescription

id

The id of the image (file name) or video frame (sequential number)

detections

size

Size of the image or frame in [width, height]

duration

Duration for processing an image or video frame.

filename

Name of the file

folder

Name of the folder (relative to root input folder)

Detected instances/objects provided as a list under the detections attribute

AttributeDescription

id

The id of the detection, a sequential number starting from 0.

parent_image

The name of the image or the id of the video frame the detected instance/object was found on

offset

The offset of the detection's bounding box from the upper left corner of the image (x/y coordinates in pixels)

bbox

The coordinates of the detection's bounding box (x1, y1, x2, y2)

score

The detection's confidence score. States how confident the model is about the detection being a specific label (see type_label) between 0.0 and 1.0.

is_anonymized

Specifies whether the detection was anonymized (or detected with method = detect).

type_label

The detection's label assigned by the model. E.g. face, license plates, etc.

type

Numerical representation of the type_label.

color

The detections color (RGB) in the instance segmentation mask.

duration

Processing duration for a video frame (only for videos).

Last updated