Dataset preparation for an object detection training workflow can take a protracted time and infrequently be frustrating. Label Studio, an open-source data annotation tool, can assist by providing a simple strategy to annotate datasets. It supports a wide selection of annotation templates, including computer vision, natural language processing, and audio or speech processing. Nonetheless, we’ll focus specifically on the thing detection workflow.
But what if you wish to make the most of pre-annotated open-source datasets, resembling the Pascal VOC dataset? In this text, I’ll show you the right way to easily import those tasks into Label Studio’s format while organising your entire stack — including a PostgreSQL database, MinIO object storage, an Nginx reverse proxy, and the Label Studio backend. MinIO is an S3-compatible object storage service: you may use cloud-native storage in production, but you may as well run it locally for development and testing.
On this tutorial, we’ll undergo the next steps:
- Convert Pascal VOC annotations – transform bounding boxes from XML into Label Studio tasks in JSON format.
- Run the total stack – start Label Studio with PostgreSQL, MinIO, Nginx, and the backend using Docker Compose.
- Arrange a Label Studio project – configure a brand new project contained in the Label Studio interface.
- Upload images and tasks to MinIO – store your dataset in an S3-compatible bucket.
- Connect MinIO to Label Studio – add the cloud storage bucket to your project so Label Studio can fetch images and annotations directly.
Prerequisites
To follow this tutorial, be certain you’ve gotten:
From VOC to Label Studio: Preparing Annotations
The Pascal VOC dataset has a folder structure where the train and test datasets are already split. The folder comprises the annotation files for every image. In total, the training set includes 17,125 images, each with a corresponding annotation file.
.
└── VOC2012
├── Annotations # 17125 annotations
├── ImageSets
│ ├── Motion
│ ├── Layout
│ ├── Fundamental
│ └── Segmentation
├── JPEGImages # 17125 images
├── SegmentationClass
└── SegmentationObject
The XML snippet below, taken from one in every of the annotations, defines a bounding box around an object labeled “person”. The box is specified using 4 pixel coordinates: xmin
, ymin
, xmax
, and ymax
.
The illustration below shows the inner rectangle because the annotated bounding box, defined by the top-left corner (xmin
, ymin
) and the bottom-right corner (xmax
, ymax
), throughout the outer rectangle representing the image.

Label Studio expects each bounding box to be defined by its width, height, and top-left corner, expressed as percentages of the image size. Below is a working example of the converted JSON format for the annotation shown above.
{
"data": {
"image": "s3:////2007_000027.jpg"
},
"annotations": [
{
"result": [
{
"from_name": "label",
"to_name": "image",
"type": "rectanglelabels",
"value": {
"x": 35.802,
"y": 20.20,
"width": 36.01,
"height": 50.0,
"rectanglelabels": ["person"]
}
}
]
}
]
}
As you may see within the JSON format, you furthermore mght have to specify the placement of the image file — for instance, a path in MinIO or an S3 bucket in the event you’re using cloud storage.
While preprocessing the information, I merged your entire dataset, despite the fact that it was already divided into training and validation. This simulates a real-world scenario where you sometimes begin with a single dataset and perform the splitting into training and validation sets yourself before training.
Running the Full Stack with Docker Compose
I merged the docker-compose.yml
and docker-compose.minio.yml
files right into a simplified single configuration so your entire stack can run on the identical network. Each files were taken from the official Label Studio GitHub repository.
services:
nginx:
# Acts as a reverse proxy for Label Studio frontend/backend
image: heartexlabs/label-studio:latest
restart: unless-stopped
ports:
- "8080:8085"
- "8081:8086"
depends_on:
- app
environment:
- LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
volumes:
- ./mydata:/label-studio/data:rw # Stores Label Studio projects, configs, and uploaded files
command: nginx
app:
stdin_open: true
tty: true
image: heartexlabs/label-studio:latest
restart: unless-stopped
expose:
- "8000"
depends_on:
- db
environment:
- DJANGO_DB=default
- POSTGRE_NAME=postgres
- POSTGRE_USER=postgres
- POSTGRE_PASSWORD=
- POSTGRE_PORT=5432
- POSTGRE_HOST=db
- LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
- JSON_LOG=1
volumes:
- ./mydata:/label-studio/data:rw # Stores Label Studio projects, configs, and uploaded files
command: label-studio-uwsgi
db:
image: pgautoupgrade/pgautoupgrade:13-alpine
hostname: db
restart: unless-stopped
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
- POSTGRES_USER=postgres
volumes:
- ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/data # Persistent storage for PostgreSQL database
minio:
image: "minio/minio:${MINIO_VERSION:-RELEASE.2025-04-22T22-12-26Z}"
command: server /data --console-address ":9009"
restart: unless-stopped
ports:
- "9000:9000"
- "9009:9009"
volumes:
- minio-data:/data # Stores uploaded dataset objects (like images or JSON tasks)
# configure env vars in .env file or your systems environment
environment:
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minio_admin_do_not_use_in_production}
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minio_admin_do_not_use_in_production}
- MINIO_PROMETHEUS_URL=${MINIO_PROMETHEUS_URL:-http://prometheus:9090}
- MINIO_PROMETHEUS_AUTH_TYPE=${MINIO_PROMETHEUS_AUTH_TYPE:-public}
volumes:
minio-data: # Named volume for MinIO object storage
This simplified Docker Compose file defines 4 core services with their volume mappings:
App – runs the Label Studio backend itself.
- Shares the
mydata
directory with Nginx, which stores projects, configurations, and uploaded files. - Uses a bind mount:
./mydata:/label-studio/data:rw
→ maps a folder out of your host into the container.
Nginx – acts as a reverse proxy for the Label Studio frontend and backend.
- Shares the
mydata
directory with the App service.
PostgreSQL (db) – manages metadata and project information.
- Stores persistent database files.
- Uses a bind mount:
${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/data
.
MinIO – an S3-compatible object storage service.
- Stores dataset objects resembling images or JSON annotation tasks.
- Uses a named volume:
minio-data:/data
.
While you mount host folders resembling ./mydata
and ./postgres-data
, it’s worthwhile to assign ownership on the host to the identical user that runs contained in the container. Label Studio doesn’t run as root — it uses a non-root user with UID 1001. If the host directories are owned by a distinct user, the container won’t have write access and also you’ll run into errors.
After creating these folders in your project directory, you may adjust their ownership with:
mkdir mydata
mkdir postgres-data
sudo chown -R 1001:1001 ./mydata ./postgres-data
Now that the directories are prepared, we will bring up the stack using Docker Compose. Simply run:
docker compose up -d
It could take a couple of minutes to tug all of the required images from Docker Hub and arrange Label Studio. Once the setup is complete, open http://localhost:8080 in your browser to access the Label Studio interface. It’s essential to create a brand new account, and then you definately can log in together with your credentials to access the interface. You’ll be able to enable a legacy API token by going to Organization → API Token Settings. This token enables you to communicate with the Label Studio API, which is very useful for automation tasks.
Arrange a Label Studio project
Now we will create our first data annotation project on Label Studio, specifically for an object detection workflow. But before beginning to annotate your images, it’s worthwhile to define the varieties of classes to select from. Within the Pascal VOC dataset, there are 20 varieties of pre-annotated objects.

Upload images and tasks to MinIO
You’ll be able to open the MinIO user interface in your browser at localhost:9000, after which log in using the credentials you specified under the relevant service within the docker-compose.yml
file.
I created a bucket with folders, one in every of which is used for storing images and one other for JSON tasks formatted in response to the instructions above.

We arrange an S3-like service locally that permits us to simulate S3 cloud storage without incurring any charges. If you wish to transfer files to an S3 bucket on AWS, it’s higher to do that directly over the web, considering the information transfer costs. The excellent news is which you can also interact together with your MinIO bucket using the AWS CLI. To do that, it’s worthwhile to add a profile in ~/.aws/config
and supply the corresponding credentials in ~/.aws/credentials
under the identical profile name.
After which, you may easily sync together with your local folder using the next commands:
#!/bin/bash
set -e
PROFILE=
MINIO_ENDPOINT= # e.g. http://localhost:9000
BUCKET_NAME=
SOURCE_DIR=
DEST_DIR=
aws s3 sync
--endpoint-url "$MINIO_ENDPOINT"
--no-verify-ssl
--profile "$PROFILE"
"$SOURCE_DIR" "s3://$BUCKET_NAME/$DEST_DIR"
Connect MinIO to Label Studio
In spite of everything the information, including the photographs and annotations, has been uploaded, we will move on to adding cloud storage to the project we created within the previous step.
Out of your project settings, go to Cloud Storage and add the required parameters, resembling the endpoint (which points to the service name within the Docker stack together with the port number, e.g., minio:9000
), the bucket name, and the relevant prefix where the annotation files are stored. Each path contained in the JSON files will then point to the corresponding image.

After verifying that the connection is working, you may sync your project with the cloud storage. Chances are you’ll have to run the sync command multiple times because the dataset comprises 22,263 images. It could appear to fail at first, but while you restart the sync, it continues to make progress. Eventually, all of the Pascal VOC data might be successfully imported into Label Studio.

You’ll be able to see the imported tasks with their thumbnail images within the task list. While you click on a task, the image will appear with its pre-annotations.

Conclusions
On this tutorial, we demonstrated the right way to import the Pascal VOC dataset into Label Studio by converting XML annotations into Label Studio’s JSON format, running a full stack with Docker Compose, and connecting MinIO as S3-compatible storage. This setup lets you work with large-scale, pre-annotated datasets in a reproducible and cost-effective way, all in your local machine. Testing your project settings and file formats locally first will ensure a smoother transition when moving to cloud environments.
I hope this tutorial helps you kickstart your data annotation project with pre-annotated data which you can easily expand or validate. Once your dataset is prepared for training, you may export all of the tasks in popular formats resembling COCO or YOLO.