Grounding DINO Prelabeling

This script performs object detection using a Grounding DINO model guided by text prompts. It processes a folder of raw images and outputs structured JSON files for each image containing detected object metadata.

Overview

  • Input: Directory of raw images

  • Output: JSON files with bounding boxes, confidence scores, and class labels

  • Model: Grounding DINO

  • Prompts: "fire", "smoke", "person", "vehicle", "lightning"

  • Thresholds: Box confidence and text match thresholds

  • Execution: Runs sequentially on a specified device

Constants

  • TEXT_PROMPTS: Default object classes to detect

  • BOX_THRESHOLD: Minimum bounding box confidence (default: 0.3)

  • TEXT_THRESHOLD: Minimum text-prompt alignment confidence (default: 0.25)


Functions

_get_image_files(directory)

Scans a directory for .jpg, .jpeg, and .png files.

  • directory (Path): Path to image directory

  • Returns: List[Path] of image files


generate_gd_prelabelling(raw_dir, output_dir, config, model_weights, config_path, text_prompts=TEXT_PROMPTS, box_threshold=None, text_threshold=None)

Main function that runs Grounding DINO to detect prompted classes in images.

  • raw_dir (Path): Directory containing input images

  • output_dir (Path): Output folder for predictions

  • config (Dict): Dictionary with device and threshold options

  • model_weights (Path): Path to DINO model checkpoint

  • config_path (Path): Path to DINO config file

  • text_prompts (List[str]): List of classes to detect (default: predefined)

  • box_threshold (float): Detection threshold (can be overridden via config)

  • text_threshold (float): Text alignment threshold (can be overridden via config)

Output per image

  • Image name

  • Class label

  • Bounding box: [x1, y1, x2, y2]

  • Confidence score

  • Source tag: "grounding_dino"

Summary Output

  • Number of successful detections

  • Number of skipped files (unreadable)

  • Number of failed detections


Configuration Parameters (from pipeline_config.json)

The following fields from the configuration file directly control Grounding DINO’s behavior:

Key

Description

torch_device

Device to run the model on ("cpu" or "cuda").

dino_box_threshold

Minimum confidence required for bounding boxes to be retained (default: 0.3).

dino_text_threshold

Minimum alignment confidence between text prompt and region (default: 0.25).

dino_false_negative_threshold

Confidence threshold to flag potential false negatives for review (default: 0.5).

These values can be overridden or adjusted in the configuration dictionary passed to the function.


Example Usage

generate_gd_prelabelling(
    raw_dir=Path("data_pipeline/input/"),
    output_dir=Path("data_pipeline/prelabeled/gdino/"),
    config={
        "torch_device": "cuda",
        "dino_box_threshold": 0.3,
        "dino_text_threshold": 0.25
    },
    model_weights=Path("model_registry/model/groundingdino_swinb_cogcoor.pth"),
    config_path=Path("model_registry/model/GroundingDINO_SwinB_cfg.py")
)