Grounding DINO Prelabeling
This page outlines the unit test coverage for the gdino_prelabelling.py
script. The module runs object detection on a directory of images using the Grounding DINO model and saves prediction results in JSON format.
Coverage Overview
This test suite includes validation for:
File scanning via
_get_image_files
End-to-end behavior of
generate_gd_prelabelling
Device detection and fallback logic
Handling of corrupted or invalid image files
Handling of empty or missing directories
JSON output structure and formatting
Constants and Configs
The following are tested either directly or via mock configuration injection:
TEXT_PROMPTS
BOX_THRESHOLD
TEXT_THRESHOLD
Device resolution:
"cuda"
,"cpu"
,"auto"
Paths to model weights and config files
Functions Tested
_get_image_files(directory)
Returns: Valid image paths (
.jpg
,.jpeg
,.png
)Tests:
Valid image files
Mixed content (image + non-image)
Empty directories
Nonexistent directory (expected to raise error upstream)
generate_gd_prelabelling(...)
Core test focus:
Successful predictions written to JSON
Skipped or unreadable files are logged
Predictions include class, confidence, bounding box
All output files use expected format
Handles model loading errors
Applies config thresholds correctly
Automatically creates output directories if missing
Uses fallback device detection logic when set to
"auto"
Key Edge Cases Tested
Empty folder: Returns no predictions and exits gracefully
Non-image files: Skipped without error
Corrupted images: Skipped and logged
Missing model file: Raises
FileNotFoundError
Output folder does not exist: Auto-created
Multiple valid predictions per image: Verified in output structure
Verbose mode: Confirms detailed logs are printed to stdout
Example Test Assertions
Count of processed images matches input count
Prediction JSON contains required keys:
bbox
,confidence
,class
,source
Output filenames correspond to input image names
Invalid images do not result in written JSON
torch_device
settings are passed to model correctly
Summary
This test suite ensures that gdino_prelabelling.py
:
Works as expected across different environments (CPU, GPU, MPS)
Produces consistently formatted output
Handles missing, corrupted, or unexpected files gracefully
Is configurable via the pipeline-level
config
and can be extended
The use of mocks and temporary directories isolates test behavior from model internals, ensuring that unit-level functionality is verified in a reproducible way.