Augmentation

src.pipeline.augmentation.augment_dataset(image_dir: Path, output_dir: Path, config: dict) None[source]

Orchestrates the full augmentation pipeline for a labeled image dataset.

This function matches labeled JSON files with their corresponding images, builds the augmentation pipeline from the provided config, and applies augmentations using augment_images.

Parameters:
  • image_dir (Path) – Directory containing the original labeled images.

  • output_dir (Path) – Root directory where augmented ‘images/’ and ‘labels/’ will be saved.

  • config (dict) – Dictionary containing augmentation settings, including number of augmentations and optional transform parameters.

Behavior:
  • Loads label files from a labeled_json_dir (‘automl_workspace/data_pipeline/labeled’)

  • Matches JSON labels to image files by filename stem

  • Builds an Albumentations transform pipeline using build_augmentation_transform

  • Applies the transform using augment_images with num_augmentations per image

  • Logs counts of label files, image files, and successful matches

Returns:

None

src.pipeline.augmentation.augment_images(matched_pairs: list, transform: Compose, output_img_dir: Path, output_json_dir: Path, num_augmentations: int, config: dict) None[source]

Apply augmentations to each labeled image and save the results.

For each (image, label) pair, this function applies the given transformation pipeline num_augmentations times. It saves the augmented images and their updated prediction labels (in JSON format) to the specified output directories.

If an image has no predictions (empty bounding box list), the original image is saved separately in a dedicated ‘no_prediction_images’ folder.

Parameters:
  • matched_pairs (list) – List of tuples, each containing a Path to a JSON file and its corresponding image file.

  • transform (A.Compose) – Albumentations transformation pipeline.

  • output_img_dir (Path) – Directory to save augmented images.

  • output_json_dir (Path) – Directory to save augmented label files.

  • num_augmentations (int) – Number of times to apply augmentations per image.

  • config (dict) – Configuration dictionary that may include a base random seed.

Returns:

None

src.pipeline.augmentation.build_augmentation_transform(config: dict) Compose[source]

Build the augmentation transform pipeline from the given configuration.

This function constructs an Albumentations Compose object with a sequence of image augmentation transforms. Each transform is applied with a configurable probability and parameter set drawn from the config dictionary.

Supported transforms: - HorizontalFlip - RandomBrightnessContrast - HueSaturationValue - Blur - GaussNoise - ToGray - Rotate

The transform also ensures bounding box alignment using ‘pascal_voc’ format.

Parameters:

config (dict) – Dictionary containing probability values and parameters for each augmentation transform.

Returns:

An Albumentations Compose object with the specified transformations

and bounding box handling.

Return type:

A.Compose