File Cleaning and Archiving
This page outlines the unit test coverage for the clean_pipeline.py
module, which handles archiving labeled data and cleaning up the data pipeline workspace.
Coverage Overview
The tests confirm that the cleaning process:
Archives labeled JSON and associated image files into timestamped folders under the master dataset directory
Cleans all pipeline folders except for
label_studio
Preserves the folder structure by keeping directories but removing their contents
Fixtures
setup_clean_test_dirs
Creates a temporary mock workspace with:
A
data_pipeline/
structure containing:labeled/
: with a sample JSON labelinput/
: with a matching imagelabel_studio/
: preserved during cleanup
A
master_dataset/
directory to hold archived results
Function Tests
test_clean_pipeline_creates_archive_and_cleans
Verifies the creation of a timestamped archive folder containing:
labels/
: where the original.json
file is storedimages/
: where the associated image file is copied
Confirms that:
label_studio/
folder and its contents remain untouchedAll other folders (
labeled/
,input/
, etc.) are retained but emptied
Summary
This test ensures that the pipeline cleanup process effectively resets the workspace while preserving key folder structures and backing up labeled content to the master_dataset/
archive.