Grounding DINO Prelabeling

This script performs object detection using a Grounding DINO model guided by text prompts. It processes a folder of raw images and outputs structured JSON files for each image containing detected object metadata.

Overview

Input: Directory of raw images
Output: JSON files with bounding boxes, confidence scores, and class labels
Model: Grounding DINO
Prompts: "fire", "smoke", "person", "vehicle", "lightning"
Thresholds: Box confidence and text match thresholds
Execution: Runs sequentially on a specified device

Constants

TEXT_PROMPTS: Default object classes to detect
BOX_THRESHOLD: Minimum bounding box confidence (default: 0.3)
TEXT_THRESHOLD: Minimum text-prompt alignment confidence (default: 0.25)

Functions

`_get_image_files(directory)`

Scans a directory for .jpg, .jpeg, and .png files.

directory (Path): Path to image directory
Returns: List[Path] of image files

`generate_gd_prelabelling(raw_dir, output_dir, config, model_weights, config_path, text_prompts=TEXT_PROMPTS, box_threshold=None, text_threshold=None)`

Main function that runs Grounding DINO to detect prompted classes in images.

raw_dir (Path): Directory containing input images
output_dir (Path): Output folder for predictions
config (Dict): Dictionary with device and threshold options
model_weights (Path): Path to DINO model checkpoint
config_path (Path): Path to DINO config file
text_prompts (List[str]): List of classes to detect (default: predefined)
box_threshold (float): Detection threshold (can be overridden via config)
text_threshold (float): Text alignment threshold (can be overridden via config)

Output per image

Image name
Class label
Bounding box: [x1, y1, x2, y2]
Confidence score
Source tag: "grounding_dino"

Summary Output

Number of successful detections
Number of skipped files (unreadable)
Number of failed detections

Configuration Parameters (from `pipeline_config.json`)

The following fields from the configuration file directly control Grounding DINO’s behavior:

Key	Description
`torch_device`	Device to run the model on (`"cpu"` or `"cuda"`).
`dino_box_threshold`	Minimum confidence required for bounding boxes to be retained (default: `0.3`).
`dino_text_threshold`	Minimum alignment confidence between text prompt and region (default: `0.25`).
`dino_false_negative_threshold`	Confidence threshold to flag potential false negatives for review (default: `0.5`).
These values can be overridden or adjusted in the configuration dictionary passed to the function.

Example Usage

generate_gd_prelabelling(
    raw_dir=Path("data_pipeline/input/"),
    output_dir=Path("data_pipeline/prelabeled/gdino/"),
    config={
        "torch_device": "cuda",
        "dino_box_threshold": 0.3,
        "dino_text_threshold": 0.25
    },
    model_weights=Path("model_registry/model/groundingdino_swinb_cogcoor.pth"),
    config_path=Path("model_registry/model/GroundingDINO_SwinB_cfg.py")
)