# Grounding DINO Prelabeling

This page outlines the unit test coverage for the `gdino_prelabelling.py` script. The module runs object detection on a directory of images using the Grounding DINO model and saves prediction results in JSON format.

---

## Coverage Overview

This test suite includes validation for:

- File scanning via `_get_image_files`
- End-to-end behavior of `generate_gd_prelabelling`
- Device detection and fallback logic
- Handling of corrupted or invalid image files
- Handling of empty or missing directories
- JSON output structure and formatting

---

## Constants and Configs

The following are tested either directly or via mock configuration injection:

- `TEXT_PROMPTS`
- `BOX_THRESHOLD`
- `TEXT_THRESHOLD`
- Device resolution: `"cuda"`, `"cpu"`, `"auto"`
- Paths to model weights and config files

---

## Functions Tested

### `_get_image_files(directory)`

- **Returns**: Valid image paths (`.jpg`, `.jpeg`, `.png`)
- **Tests**:
  - Valid image files
  - Mixed content (image + non-image)
  - Empty directories
  - Nonexistent directory (expected to raise error upstream)

---

### `generate_gd_prelabelling(...)`

**Core test focus:**

- Successful predictions written to JSON
- Skipped or unreadable files are logged
- Predictions include class, confidence, bounding box
- All output files use expected format
- Handles model loading errors
- Applies config thresholds correctly
- Automatically creates output directories if missing
- Uses fallback device detection logic when set to `"auto"`

---

## Key Edge Cases Tested

- **Empty folder**: Returns no predictions and exits gracefully
- **Non-image files**: Skipped without error
- **Corrupted images**: Skipped and logged
- **Missing model file**: Raises `FileNotFoundError`
- **Output folder does not exist**: Auto-created
- **Multiple valid predictions per image**: Verified in output structure
- **Verbose mode**: Confirms detailed logs are printed to stdout

---

## Example Test Assertions

- Count of processed images matches input count
- Prediction JSON contains required keys: `bbox`, `confidence`, `class`, `source`
- Output filenames correspond to input image names
- Invalid images do not result in written JSON
- `torch_device` settings are passed to model correctly

---

## Summary

This test suite ensures that `gdino_prelabelling.py`:

- Works as expected across different environments (CPU, GPU, MPS)
- Produces consistently formatted output
- Handles missing, corrupted, or unexpected files gracefully
- Is configurable via the pipeline-level `config` and can be extended

The use of mocks and temporary directories isolates test behavior from model internals, ensuring that unit-level functionality is verified in a reproducible way.