Training models with Panoptic Segmentation in Detectron2

Tutorial on how to train your own models with panoptic segmentation in Detectron2.

14 May 2020, by Boyang XiaAsk a question

Panoptic Segmentation


A paper [1] came out April last year describing a method combining semantic segmentation (assigning each pixel a class label) and instance segmentation (finding individual objects with its form and label). Detectron2 offers support for panoptic segmentation since last October and in this tutorial, we'll show how easy it is to train your own model with panoptic segmentation.

[1] Kirillov, Alexander et al. (2019). Panoptic Segmentation. arXiv:1801.00868v3


We tested this tutorial on Ubuntu 18.04, but it should also work on other systems. The installations of the NVIDIA driver and required dependencies may deviate from the instructions below.


You need a CUDA-enabled graphic card with at least 11GB GPU memory, e.g. NVIDIA GeForce RTX 2080 Ti, because instance segmentation is extremely memory hungry.


If NVIDIA driver is not pre-installed, you can install it with sudo apt install nvidia-XXX (XXX is the version, the newest one is 440) if you are using Ubuntu or download the appropriate NVIDIA driver (for Linux) and execute the binary as sudo.


On Ubuntu 18.04, install CUDA 10.2 with the following script (from NVIDIA Developer):

sudo mv /etc/apt/preferences.d/cuda-repository-pin-600
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/
sudo apt-get update
sudo apt-get -y install cuda

You find setup instructions for other systems on the NVIDIA Developer website.

Install Detectron2


The current version of Detectron2 requires

  • Python ≥ 3.6
  • PyTorch ≥ 1.4

On Ubuntu, run following lines in Bash (get pip with sudo apt install python3-pip):

# Install PyTorch and other dependencies
pip install --user torch torchvision tensorboard cython
# Install OpenCV (optional)
sudo apt install python3-opencv
pip install --user opencv-python
# Install fvcore
pip install --user 'git+'
# Install pycocotools
pip install --user 'git+'

Download and install Detectron2

In the newest version (0.1.2) of Detectron2, you need to set the environmental variable CUDA_HOME to the location of the CUDA library. In Ubuntu, it is under /usr/local/cuda-XX.X/.

export FORCE_CUDA="1"
export CUDA_HOME="/usr/local/cuda-10.2/"
git clone
cd detectron2
pip install .

If you still encounter problems, check out the official installation guide.

Training the model

We base the tutorial on Detectron2 Beginner's Tutorial and train a balloon detector.

The setup for panoptic segmentation is very similar to instance segmentation. However, as in semantic segmentation, you have to tell Detectron2 the pixel-wise labelling of the whole image, e.g. using an image where the colours encode the labels.

    # ...
    record["height"] = height
    record["width"] = width
    # Pixel-wise segmentation
    record["sem_seg_file_name"] = os.path.join(img_dir, "segmentation", v["filename"])

    # ...

You can generate the mask images with the script provided for this demo.

If you want to visualise the dataset with Detectron's Visualizer, add an empty list of stuff class. "Things" are well-defined countable objects, while "stuff" is amorphous something with a different label than the background.

    # ...
    MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"], stuff_classes=[])
    # ...

Otherwise Visualizer complains:

AttributeError: Attribute 'stuff_classes' does not exist in the metadata of 'balloon_train'. Available keys are dict_keys(['name', 'thing_classes']).


The training with the default settings takes a bit more than a minute on an NVIDIA Tesla V100 and requires about 9GiB GPU memory (instance segmentation training takes about 6 GiB). The resulting model does not necessarily perform any better than normal instance segmentation, which given the dataset and task (ballon detection) is no wonder.

However, if you want to train a model that can both detect instances and distinguish between different backgrounds, e.g. sky, ocean and sand on a beach, or street, houses and vegetation in a cityscape, then panoptic segmentation may be the right choice for you.


Panoptimic segmenation, like semantic segmentation, is very memory hungry and you'll soon encounter the limits, e.g. if you increase the batch size (SOLVER.IMS_PER_BATCH) from 2 to 8:

RuntimeError: CUDA out of memory. Tried to allocate x.xx GiB (GPU 0; xx.xx GiB total capacity; xx.xx GiB already allocated; x.xx GiB free; xx.xx GiB reserved in total by PyTorch)

If you have multiple GPUs, you can use the handy function launch provided by Detectron2 (in module detectron2.engine.launch) to split the training up onto different GPUs:

        train, # function to be parallelised across multiple GPUs
        4, # Numer of GPUs per machine
        args=(cfg,), # arguments to the function `train'

📌 You find the scripts from this tutorial also in our GitHub repo.

machine learningcomputer visionenglish
Start Demo Contact Us

Latest Blog Posts

How to copy XMP metadata between JPEG images (again)

Copying XMP metadata between images isn't straightforward. Read how it's done correctly.

20x Faster Than NumPy: Mean & Std for uint8 Arrays

How to calculate mean and standard deviation 20 times faster than NumPy for uint8 arrays.

Celantur and Virtual Vehicle Collaborate for Privacy Preserving Driving Technology

Enabling automotive companies to develop AD/ADAS systems while respecting privacy.