File Cleaning and Archiving
This page outlines the unit test coverage for the clean_pipeline.py module, which handles archiving labeled data and cleaning up the data pipeline workspace.
Coverage Overview
The tests confirm that the cleaning process:
Archives labeled JSON and associated image files into timestamped folders under the master dataset directory
Cleans all pipeline folders except for
label_studioPreserves the folder structure by keeping directories but removing their contents
Fixtures
setup_clean_test_dirs
Creates a temporary mock workspace with:
A
data_pipeline/structure containing:labeled/: with a sample JSON labelinput/: with a matching imagelabel_studio/: preserved during cleanup
A
master_dataset/directory to hold archived results
Function Tests
test_clean_pipeline_creates_archive_and_cleans
Verifies the creation of a timestamped archive folder containing:
labels/: where the original.jsonfile is storedimages/: where the associated image file is copied
Confirms that:
label_studio/folder and its contents remain untouchedAll other folders (
labeled/,input/, etc.) are retained but emptied
Summary
This test ensures that the pipeline cleanup process effectively resets the workspace while preserving key folder structures and backing up labeled content to the master_dataset/ archive.