# Introduction

Wildfires pose a critical threat to ecosystems, infrastructure, and human life. Timely and accurate detection is essential for effective intervention and mitigation. However, developing high-performing object detection models for wildfire detection is often constrained by the lack of labeled data and the time-intensive process of manual annotation.

This project presents an **end-to-end AutoML pipeline** for wildfire detection using a **CI/CD/CT** (Continuous Integration, Continuous Deployment, and Continuous Training) architecture. The pipeline automates the entire lifecycle of a detection model. It starts from raw image collection and continues through pre-labeling, human validation, training, distillation, quantization, and deployment.

## Motivation

Manual labeling of wildfire imagery is time-consuming and error-prone. In addition, models degrade over time as environmental conditions and data distributions shift. Our system aims to continuously learn from new data using a scalable, semi-supervised approach. It automates as much of the machine learning workflow as possible and involves human review only when necessary.

## Key Features

- Automated pre-labeling using YOLOv8 and Grounding DINO  
- Model matching and validation using IoU and confidence thresholds  
- Human-in-the-loop review for mismatches via Label Studio  
- Image augmentation to improve generalization  
- End-to-end training, distillation, and quantization  
- CI/CD/CT-compatible design for regular updates and retraining  

### Workflow Overview

1. **Data Collection**  
   Unlabeled wildfire images are collected from remote sensors and placed into a raw image directory.

2. **Pre-labeling (YOLO and Grounding DINO)**  
   Both models generate bounding boxes independently. YOLO is fast and lightweight. Grounding DINO supports natural language prompts.

3. **Matching**  
   Predictions from both models are matched using class name and IoU. Unmatched results are flagged for human review.

4. **Human-in-the-Loop Review**  
   Label Studio is used to manually verify or correct mismatched results.

5. **Augmentation**  
   Verified labeled images are augmented to enrich the dataset.

6. **Training**  
   A new YOLO model is trained on the augmented dataset.

7. **Distillation and Quantization**  
   The full model is distilled into a lightweight version and then quantized for deployment.

8. **Model Registry Update**  
   Trained models are stored in the registry and used for future pre-labeling.

---

This pipeline ensures **scalability**, **adaptability**, and **model freshness** without relying heavily on constant manual labeling. The integration of human review only when needed helps balance efficiency with accuracy.