Matching Logic

This script compares YOLO and Grounding DINO predictions for the same image and flags mismatches for human review. It evaluates object matches based on class name and Intersection-over-Union (IoU) and applies configurable thresholds to determine the confidence of each detection.

Overview

Input:
- YOLO-generated JSON files
- DINO-generated JSON files
Output:
- Matched files → saved to labeled directory
- Mismatched files → saved to pending directory

Functions

`compute_iou(box1, box2)`

Computes the Intersection-over-Union between two bounding boxes.

box1, box2: Lists of [x1, y1, x2, y2] format
Returns: float IoU score

`normalize_class(c)`

Cleans up class names for matching purposes.

Removes "BSI" suffixes and lowercases
Returns: normalized class name

`match_predictions(yolo_preds, dino_preds, iou_thresh)`

Matches YOLO predictions to DINO predictions by class name and IoU.

yolo_preds, dino_preds: Lists of prediction dictionaries
iou_thresh: Minimum IoU to consider a match
Returns: List[bool] indicating which YOLO predictions matched

`match_and_filter(yolo_dir, dino_dir, labeled_dir, pending_dir, config)`

Main function to match predictions and split into labeled or pending sets.

Loads predictions from YOLO and DINO
Flags mismatches based on:
- Low/medium YOLO confidence
- High-confidence DINO detections missed by YOLO

Output Actions:

adds confidence_flag to flagged predictions
Saves:
- Confident matches to labeled_dir
- Mismatches to pending_dir for human review

Config Example:

config = {
    "iou_threshold": 0.5,
    "low_conf_threshold": 0.3,
    "mid_conf_threshold": 0.6,
    "dino_false_negative_threshold": 0.5
}

Summary Output:

Total successfully processed files
Skipped/unmatched files
Files that failed to process due to error

Configuration Parameters (from `pipeline_config.json`)

The following fields from the pipeline_config.json file directly control YOLO–DINO Matching Behavior:

Key	Description
`iou_threshold`	Minimum IoU score to consider two boxes (YOLO and DINO) a match (default: `0.5`).
`low_conf_threshold`	YOLO confidence below this is considered a likely false positive (default: `0.3`).
`mid_conf_threshold`	YOLO confidence below this (but above low) triggers a human review (default: `0.6`).
`dino_false_negative_threshold`	If DINO detects an object above this confidence and YOLO misses it, flag for review (default: `0.5`).

These thresholds guide whether a prediction is confidently accepted, flagged for review, or rejected.

Example Usage

match_and_filter(
    yolo_dir=Path("data_pipeline/prelabeled/yolo/"),
    dino_dir=Path("data_pipeline/prelabeled/gdino/"),
    labeled_dir=Path("data_pipeline/labeled/"),
    pending_dir=Path("data_pipeline/label_studio/pending/"),
    config=config
)