Matching Logic

This script compares YOLO and Grounding DINO predictions for the same image and flags mismatches for human review. It evaluates object matches based on class name and Intersection-over-Union (IoU) and applies configurable thresholds to determine the confidence of each detection.

Overview

  • Input:

    • YOLO-generated JSON files

    • DINO-generated JSON files

  • Output:

    • Matched files → saved to labeled directory

    • Mismatched files → saved to pending directory


Functions

compute_iou(box1, box2)

Computes the Intersection-over-Union between two bounding boxes.

  • box1, box2: Lists of [x1, y1, x2, y2] format

  • Returns: float IoU score


normalize_class(c)

Cleans up class names for matching purposes.

  • Removes "BSI" suffixes and lowercases

  • Returns: normalized class name


match_predictions(yolo_preds, dino_preds, iou_thresh)

Matches YOLO predictions to DINO predictions by class name and IoU.

  • yolo_preds, dino_preds: Lists of prediction dictionaries

  • iou_thresh: Minimum IoU to consider a match

  • Returns: List[bool] indicating which YOLO predictions matched


match_and_filter(yolo_dir, dino_dir, labeled_dir, pending_dir, config)

Main function to match predictions and split into labeled or pending sets.

  • Loads predictions from YOLO and DINO

  • Flags mismatches based on:

    • Low/medium YOLO confidence

    • High-confidence DINO detections missed by YOLO

Output Actions:

  • adds confidence_flag to flagged predictions

  • Saves:

    • Confident matches to labeled_dir

    • Mismatches to pending_dir for human review

Config Example:

config = {
    "iou_threshold": 0.5,
    "low_conf_threshold": 0.3,
    "mid_conf_threshold": 0.6,
    "dino_false_negative_threshold": 0.5
}

Summary Output:

  • Total successfully processed files

  • Skipped/unmatched files

  • Files that failed to process due to error


Configuration Parameters (from pipeline_config.json)

The following fields from the pipeline_config.json file directly control YOLO–DINO Matching Behavior:

Key

Description

iou_threshold

Minimum IoU score to consider two boxes (YOLO and DINO) a match (default: 0.5).

low_conf_threshold

YOLO confidence below this is considered a likely false positive (default: 0.3).

mid_conf_threshold

YOLO confidence below this (but above low) triggers a human review (default: 0.6).

dino_false_negative_threshold

If DINO detects an object above this confidence and YOLO misses it, flag for review (default: 0.5).

These thresholds guide whether a prediction is confidently accepted, flagged for review, or rejected.


Example Usage

match_and_filter(
    yolo_dir=Path("data_pipeline/prelabeled/yolo/"),
    dino_dir=Path("data_pipeline/prelabeled/gdino/"),
    labeled_dir=Path("data_pipeline/labeled/"),
    pending_dir=Path("data_pipeline/label_studio/pending/"),
    config=config
)