Setup Guide
This guide will walk you through setting up and running the AutoML CI/CD/CT: Continuous Training and Deployment Pipeline project.
1. Clone the Repository
git clone https://github.com/Capstone-AutoML/AutoML_Capstone.git
cd AutoML_Capstone
2. Run the Pipeline with Docker
Important: Docker cannot handle interactive Label Studio sessions for human review. Before running with Docker, you must disable human review in automl_workspace/config/pipeline_config.json
:
"process_options": {
"skip_human_review": true
}
If you want to run human-in-the-loop validation using Label Studio, refer to the Human Intervention documentation section.
💻 If You Have a GPU (CUDA Supported)
You can simply run:
docker compose up
This command will:
Download necessary datasets and models on first run (unless
automl_workspace/data_pipeline/
,automl_workspace/data_pipeline/distillation/
, orautoml_workspace/model_registry/model/
are removed).Automatically use your GPU if the following key is updated in both
automl_workspace/config/train_config.json
andautoml_workspace/config/pipeline_config.json
:
"torch_device": "cuda"
Default is
"cpu"
, which will force CPU-only execution.
If you want to run the auto-labeling part of the pipeline separately, do:
docker compose run auto_labeling
This step should always come first.
Then, to run the augmentation, training, and compression steps, use:
docker compose run train_compress
💻 If You Have a CPU-Only Machine (No NVIDIA GPU)
Before running, replace your docker-compose.yaml
file with:
services:
capstone:
image: celt313/automl_capstone:v0.0.3
ipc: host
platform: linux/x86_64
container_name: automl_capstone
ipc: host
working_dir: /app
entrypoint: bash
command: -c "source activate capstone_env && ./fetch_dataset.sh && python src/main.py"
volumes:
- .:/app
generate_box:
image: celt313/automl_capstone:v0.0.3
ipc: host
platform: linux/x86_64
profiles: ["optional"]
entrypoint: bash
command: -c "source activate capstone_env && python src/generate_boxed_images.py"
volumes:
- .:/app
auto_labeling:
image: celt313/automl_capstone:v0.0.3
ipc: host
platform: linux/x86_64
profiles: ["optional"]
entrypoint: bash
command: -c "source activate capstone_env && ./fetch_dataset.sh && python src/label_main.py"
volumes:
- .:/app
train_compress:
image: celt313/automl_capstone:v0.0.3
ipc: host
platform: linux/x86_64
profiles: ["optional"]
entrypoint: bash
command: -c "source activate capstone_env && python src/train_compress.py"
volumes:
- .:/app
test:
image: celt313/automl_capstone:v0.0.3
ipc: host
platform: linux/x86_64
profiles: ["optional"]
entrypoint: bash
command: -c "source activate capstone_env && pytest tests/"
volumes:
- .:/app
Then run:
docker compose up
to run the entire pipeline.
If you want to run the auto-labeling part of the pipeline separately, do:
docker compose run auto_labeling
This step should always come first.
Then, to run the augmentation, training, and compression steps, use:
docker compose run train_compress
3. Run Tests (Optional)
To verify the setup and run unit tests:
docker compose run test
4. Generate Bounding Box Visualizations (Optional)
To run the script that overlays bounding boxes on sample and labeled images using predictions from YOLO, DINO, and mismatched sources:
docker compose run generate_box
This will:
Sample and draw 10 images each from YOLO, DINO, and mismatched directories.
Draw bounding boxes on all images from the labeled directory.
Save the visualized outputs under
automl_workspace/data_pipeline/boxed_images/
5. Configuration Files
These config files control pipeline behavior:
pipeline_config.json
: Process options and distillation settings.augmentation_config.json
: Augmentation parameters and seed.train_config.json
: Training parameters, dataset paths, and device.distillation_config.yaml
: Distillation settings (model paths, epochs, patience, etc.)quantize_config.json
: Model quantization settings (labeled images paths, quantization method, etc.)
Process Options
Control which pipeline steps to run via pipeline_config.json
:
// Set to true to skip a step
"process_options": {
"skip_human_review": false,
"skip_training": false,
"skip_distillation": false,
"skip_quantization": false
}
Device Configuration
For GPU usage, set in both pipeline_config.json
and train_config.json
:
"torch_device": "cuda"
Default is "cpu"
for CPU-only execution.
6. Add Your Own Dataset
To start fresh with your own dataset:
Clear existing data:
rm -rf automl_workspace/data_pipeline/*
Add your images to:
automl_workspace/data_pipeline/input/ ├── image1.jpg ├── image2.jpg └── ...
Make sure the images are in .jpg
, .jpeg
, or .png
format.
7. Workspace Directory Structure
The data and model directories should be structured as follows:
automl_workspace/
├── config/ # All config files
├── data_pipeline/
│ ├── input/ # Add your images here
│ ├── labeled/ # Labeled images and annotations
│ ├── augmented/ # Augmented images
│ ├── label_studio/ # Label Studio related files
│ └── ...
├── model_registry/
│ ├── model/ # Model weights
│ ├── distilled/ # Distilled model outputs
│ └── quantized/ # Quantized model outputs
└── master_dataset/ # Archived labeled datasets