Augmentation
- src.pipeline.augmentation.augment_dataset(image_dir: Path, output_dir: Path, config: dict) None [source]
Orchestrates the full augmentation pipeline for a labeled image dataset.
This function matches labeled JSON files with their corresponding images, builds the augmentation pipeline from the provided config, and applies augmentations using augment_images.
- Parameters:
image_dir (Path) – Directory containing the original labeled images.
output_dir (Path) – Root directory where augmented ‘images/’ and ‘labels/’ will be saved.
config (dict) – Dictionary containing augmentation settings, including number of augmentations and optional transform parameters.
- Behavior:
Loads label files from a labeled_json_dir (‘automl_workspace/data_pipeline/labeled’)
Matches JSON labels to image files by filename stem
Builds an Albumentations transform pipeline using build_augmentation_transform
Applies the transform using augment_images with num_augmentations per image
Logs counts of label files, image files, and successful matches
- Returns:
None
- src.pipeline.augmentation.augment_images(matched_pairs: list, transform: Compose, output_img_dir: Path, output_json_dir: Path, num_augmentations: int, config: dict) None [source]
Apply augmentations to each labeled image and save the results.
For each (image, label) pair, this function applies the given transformation pipeline num_augmentations times. It saves the augmented images and their updated prediction labels (in JSON format) to the specified output directories.
If an image has no predictions (empty bounding box list), the original image is saved separately in a dedicated ‘no_prediction_images’ folder.
- Parameters:
matched_pairs (list) – List of tuples, each containing a Path to a JSON file and its corresponding image file.
transform (A.Compose) – Albumentations transformation pipeline.
output_img_dir (Path) – Directory to save augmented images.
output_json_dir (Path) – Directory to save augmented label files.
num_augmentations (int) – Number of times to apply augmentations per image.
config (dict) – Configuration dictionary that may include a base random seed.
- Returns:
None
- src.pipeline.augmentation.build_augmentation_transform(config: dict) Compose [source]
Build the augmentation transform pipeline from the given configuration.
This function constructs an Albumentations Compose object with a sequence of image augmentation transforms. Each transform is applied with a configurable probability and parameter set drawn from the config dictionary.
Supported transforms: - HorizontalFlip - RandomBrightnessContrast - HueSaturationValue - Blur - GaussNoise - ToGray - Rotate
The transform also ensures bounding box alignment using ‘pascal_voc’ format.
- Parameters:
config (dict) – Dictionary containing probability values and parameters for each augmentation transform.
- Returns:
- An Albumentations Compose object with the specified transformations
and bounding box handling.
- Return type:
A.Compose