How to Structure Large CVAT Projects on AWS

Structuring a large CVAT project on AWS requires 4 decisions made before the first task is assigned:

Project organisation
S3 storage design
Team access control
Quality enforcement

CVAT is an open-source annotation platform that ML teams use to label images and videos for

Object detection
Segmentation
Tracking
Classification models

AWS CVAT runs on EC2 for compute, S3 for storage, and IAM for access control across annotation teams. Yobitel CVAT AMI on AWS Marketplace delivers this environment pre-configured, with Docker, Python, and all core dependencies installed. So, teams skip the setup entirely and go straight to annotation work.

Defining label schemas per task causes class names to drift across batches, making exports incompatible with each other. A large task prevents quality tracking because it’s impossible to isolate the failure. Storing raw data on EC2 volumes puts the entire dataset at risk when an instance fails. It allows annotators to modify label schemas and task configurations that they shouldn't touch. Every one of these problems has a direct fix, and every fix is applied before annotation begins.

The guide describes the exact configurations, structures, instance sizing, and access rules. This keeps a large CVAT deployment on AWS running smoothly from the first job to the final export.

1. Build the project hierarchy before you need it

CVAT operates on a 3-tier hierarchy, with each level having a specific role. Using them correctly is the foundation of a scalable annotation pipeline.

Projects

CVAT projects are more than folders. The task has authority over every label name, annotation type, and attribute used across all tasks. Projects should define label schemas, not tasks at the task level.

When label schemas are defined per task, label drift is inevitable. One task uses "car", another uses "vehicle", and a third uses "Car" with a capital C. Each looks like a minor naming variation. Collectively, they produce 3 incompatible label sets that cannot be merged into a single training dataset without manually correcting hundreds or thousands of annotations.

Before creating the first task, define every class name and attribute in the Project settings. Use lowercase, underscore-separated names consistently. "traffic_light" not "TrafficLight". Add placeholder labels for known edge cases now. Adding a label later is simple. Re-annotating tasks because a label was missing is not.

Tasks

Each Task should contain a fixed, manageable number of images. The recommended range for static images is 500 to 1,500 per task. Image sequences work best at 200 to 500. Video tasks should contain one continuous sequence per task.

Keeping tasks within these limits determines how granularly you can track progress, isolate quality problems, and reassign work. A task of 10,000 images gives you one progress data point and zero ability to identify which 2,000 images caused a quality failure. Ten tasks of 1,000 images each give you ten data points and full visibility into exactly where problems occurred.

Name every task with the same format: {project-code}-{domain}-{batch-number}-{date}. For example: AVD-urban-night-B04-20250601.

This makes S3 export paths predictable, backup archives self-describing, and progress dashboards readable at a glance.

Jobs

When a Task is created, CVAT automatically divides it into Jobs based on a segment size you define. A task of 1,000 images with 200 segments results in 5 jobs. A single annotator oversees each and tracks their quality.

Video tasks require a frame overlap of 10 to 20 frames between jobs. Without overlap, objects crossing a job boundary produce broken tracks in the export. Errors in tracking continuity are not visible frame by frame. They only appear when the full track is reviewed across job boundaries. This is also why honeypots are not supported for video tasks. For video, ground truth jobs are the only automated QA mechanism available.

2. Select the EC2 instance for peak workload

The right EC2 instance is sized for your peak annotation workload. Undersizing does not produce an immediate error. It produces latency that accumulates across annotator sessions until jobs time out, exports stall, and productivity drops in ways that are hard to diagnose.

CVAT runs PostgreSQL, Redis, and multiple Docker containers simultaneously. Every active annotator session adds load to the same compute. The compute profile shifts significantly depending on annotation types in use. Standard bounding box annotation places minimal load on the instance. AI-assisted polygon segmentation on video frames requires significantly more computing power. It saturates an undersized instance within hours of active annotation work.

A team of 12 annotators running polygon segmentation on a medical imaging dataset on a t3.large. It will see response times degrade past 2 seconds per interaction within days. Moving to a t3.xlarge resolves the latency immediately. Upgrading to a larger instance costs less than losing one week of productivity on an undersized one.

AI-assisted annotation runs as Nuclio serverless functions and requires dedicated compute headroom. Sharing one instance for both AI inference and annotation degrades CPU performance and slows down every annotator on the team. Segment Anything Model or DEXTR using GPU inference is the right setup.

Attach a dedicated EBS volume for CVAT's PostgreSQL data and Docker volumes, separate from the root volume. Keep the root at 20 to 30 GB and allocate at least 50 GB to the data volume. For a 200,000-image dataset, size the data volume at 200 GB from the start. Expanding mid-project requires a maintenance window that interrupts active annotation work. Keeping volumes separate also means an instance failure or replacement never touches annotation state.

3. Use S3 as the only location for raw data

Every image in your annotation pipeline must be stored in S3, not on the EC2 volume, not on a local machine, and not split across multiple locations. S3 serves as the single data store for all raw annotation data. CVAT connects to it as the annotation layer, reading images directly from S3, writing annotation metadata to its own database, and exporting labelled datasets on demand.

Storing raw images on the EC2 volume ties data to compute. If the instance fails, is replaced, or is resized, the data is at risk. Rebuilding the instance is straightforward. Recovering annotation data from a failed volume is not. Keeping images in S3 means the compute lifecycle and the data lifecycle are completely independent. CVAT reads from S3, writes annotation metadata to its own database, and exports labelled datasets on demand.

Use IAM role authentication

CVAT supports native S3 integration with IAM role authentication. The setup is:

Create an IAM role with S3 read and write permissions scoped to your specific annotation bucket
Attach the role to the EC2 instance profile so the instance authenticates automatically with no keys stored anywhere in the application
Register the bucket in CVAT under Cloud Storage
Create all tasks by referencing the S3 path directly, so images are never copied to the EC2 volume

The IAM policy for the EC2 instance role:

json

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"s3:GetObject",

"s3:PutObject",

"s3:ListBucket",

"s3:DeleteObject"

],

"Resource": [

"arn:aws:s3:::your-annotation-bucket",

"arn:aws:s3:::your-annotation-bucket/*"

]

}

]

}

Static access keys expire, rotate, and leak. The IAM instance profile issues short-lived credentials automatically. The application never handles a key at all.

Organise your S3 bucket to match the pipeline:

raw/ - organised by project and batch, e.g. raw/autonomous-driving/urban-day-batch-01/
exports/ - organised by format, e.g. exports/coco-json/ and exports/yolo-txt/
backups/ - task archives for completed work

The training script points directly at the exports folder when the task export runs. There is no intermediate copying step, no staging locally, and no format conversion between annotation completion and pipeline ingestion.

Automate exports via the CVAT rest API

Do not rely on manual exports in a production annotation operation:

bash #!/bin/bash TASK_ID=$1 CVAT_HOST="http://:8080" TOKEN="" FORMAT="YOLO 1.1"

curl -X GET "${CVAT_HOST}/api/tasks/${TASK_ID}/annotations?format=${FORMAT// /%20}&action=download" -H "Authorisation: Token ${TOKEN}" --output "task_${TASK_ID}_export.zip"

aws s3 cp "task_${TASK_ID}export.zip" s3://your-annotation-bucket/exports/yolo-txt/task${TASK_ID}_export.zip

echo "Task ${TASK_ID} exported and uploaded."

Schedule this via AWS EventBridge on a nightly cadence, or trigger it from your MLOps pipeline on task completion events.

Export format for your training framework

CVAT exports completed annotations in over 20 formats. The format you choose determines which training frameworks can consume the data directly, how well it handles complex geometries, and how the files behave at scale. The three formats most teams use, plus one recommended for long-term archiving, are covered below.

For teams running multiple model types on the same dataset, export to Datumaro as the master archive first. Datumaro preserves every annotation geometry, metadata layer, and data provenance attribute without loss. YOLO or COCO outputs are then generated programmatically during the training pipeline preprocessing phase, eliminating any risk of data loss from repeated format conversions.

4. Configure quality control before annotation starts

Manually reviewing every job at production scale creates a bottleneck at the reviewer, delays feedback to annotators, and still misses systematic errors that only appear statistically across a large sample.

Bad annotations lead to model rejection, wasted computation cycles, and annotation work that needs to be repeated. Catching quality problems during annotation costs a fraction of what it costs to catch them after training. CVAT provides 3 automated QA mechanisms. All 3 must be configured before the first annotator assignment.

Ground truth jobs

A Ground Truth job is a separately annotated reference set that CVAT uses to benchmark the quality of regular annotation work. A curated sample of 5 to 10 per cent of task images is sufficient to estimate quality across the full dataset. Annotate this job carefully before releasing regular jobs to annotators. CVAT then compares each completed regular job against the GT job and generates per-annotator quality scores automatically. For video tasks, ground truth jobs are the only automated QA option available.

Honeypots

Honeypots embed GT frames invisibly into regular jobs. CVAT randomly inserts reference frames into each annotator's job, and the annotator cannot tell which frames are being evaluated. After submission, CVAT scores the job by comparing the annotator's work on those hidden frames against the reference labels. This produces an accurate measure of real annotation behaviour. 2 constraints apply: honeypots are only supported for image tasks, not video, and the GT frame set is fixed at task creation. Configure honeypots before the task is published.

Immediate annotator feedback

Once a job is scored, CVAT surfaces the result to the annotator immediately. Set Max Validations per Job to 3 under Quality Control Settings. Annotators correct their own work rather than waiting for a reviewer to return it. Reviewer time is spent on genuinely difficult cases, and annotation accuracy improves faster across the team.

5. Set role boundaries at the organisation level

Set roles at the Organisation level, not per project or task. Annotators see only the jobs assigned to them and have no visibility into other projects or tasks. Quality Analysts access quality analytics and assignment controls without needing CTO-level permissions.

Contractors and external annotators receive the Annotator role without exception. A contractor with Operations Manager access can modify label schemas, change task configurations, and break completed work across the entire project. There are no exceptions to this rule.

6. Mistakes that compound at scale

All of these are common to large annotation teams. Their impact grows with dataset size.

Starting with a production-ready foundation like the Yobitel CVAT AMI on AWS Marketplace removes every infrastructure decision that stands between your team and productive annotation work. The only thing left to do is log in and start labelling.

How to structure large CVAT projects on AWS? - Senior ML engineer's guide

1. Build the project hierarchy before you need it

2. Select the EC2 instance for peak workload

3. Use S3 as the only location for raw data

4. Configure quality control before annotation starts

5. Set role boundaries at the organisation level

6. Mistakes that compound at scale

Comments

CVAT

More from this blog

I Tested GPT-OSS-20B Against the Field. The Architecture Explains Everything

Command Palette

1. Build the project hierarchy before you need it

2. Select the EC2 instance for peak workload

3. Use S3 as the only location for raw data

4. Configure quality control before annotation starts

5. Set role boundaries at the organisation level

6. Mistakes that compound at scale

Comments

CVAT

More from this blog