Distillation

src.pipeline.distillation.distillation.build_optimizer_and_scheduler(model: DetectionModel, detection_trainer: DetectionTrainer, model_args: Dict[str, Any]) → Tuple[Optimizer, LambdaLR][source]

Build the optimizer and learning rate scheduler.

Parameters:

model – DetectionModel instance
detection_trainer – DetectionTrainer instance
model_args – Model arguments

Returns:

Tuple of optimizer and learning rate scheduler

src.pipeline.distillation.distillation.calculate_gradient_norm(model: Module) → float[source]

Calculate the total gradient norm across all parameters.

Parameters:: model – The model to calculate gradient norm for
Returns:: Total gradient norm as a float

src.pipeline.distillation.distillation.freeze_layers(model: Module, num_layers: int = 10) → None[source]

Freeze the first n layers of the model. For example, if num_layers = 10, the first 10 layers (The Backbone) will be frozen. https://community.ultralytics.com/t/guidance-on-freezing-layers-for-yolov8x-seg-transfer-learning/189/2 https://github.com/ultralytics/ultralytics/blob/3e669d53067ff1ed97e0dad0a4063b156f66686d/ultralytics/engine/trainer.py#L258

Parameters:

model – The model to freeze layers in
num_layers – Number of layers to freeze from the start

src.pipeline.distillation.distillation.head_features_decoder(head_feats: List[Tensor], nc: int, detection_criterion: v8DetectionLoss, reg_max: int = 16, strides: List[int] = [8, 16, 32], device: str = 'cpu') → Tensor[source]

Decode the head features into bounding boxes and class scores.

Parameters:

head_feats – List of tensors, each representing a feature map from a detection head
nc – Number of classes
detection_criterion – Detection loss criterion
reg_max – Maximum number of bounding box parameters
strides – List of strides for the feature maps
device – Device to perform computations on

Returns:

pred_concatted: Concatenated bounding boxes and class raw logits scores: Shape is (batch_size, 4 + num_classes, total_predictions)

Return type:

Tensor

src.pipeline.distillation.distillation.load_checkpoint(checkpoint_path: Path, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, device: str = 'cpu') → int[source]

Load a checkpoint and restore model and optimizer state.

Parameters:

checkpoint_path – Path to the checkpoint file
student_model – Student model to restore state to
optimizer – Optimizer to restore state to
learning_rate_scheduler – Learning rate scheduler to restore state to

Returns:

The epoch number from the checkpoint

src.pipeline.distillation.distillation.load_models(device: str, base_dir: Path, distillation_config: Dict[str, Any]) → Tuple[YOLO, YOLO][source]

Load teacher and student models.

Parameters:

device – Device to load models on
base_dir – Base directory for model paths
distillation_config – Configuration dictionary for distillation

Returns:

Tuple of (teacher_yolo, student_yolo) models

src.pipeline.distillation.distillation.log_training_metrics(log_file: Path, epoch: int, batch_idx: int | None, losses: Dict[str, float], grad_norm_before: float | None = None, grad_norm_after: float | None = None, is_new_file: bool = False, log_level: Literal['batch', 'epoch'] = 'epoch') → None[source]

Log training metrics to a CSV file.

Parameters:

log_file – Path to the log file
epoch – Current epoch number
batch_idx – Current batch index (None for epoch-level logging)
losses – Dictionary of loss values
grad_norm_before – Gradient norm before clipping
grad_norm_after – Gradient norm after clipping
is_new_file – Whether this is the first write to the file
log_level – Whether to log at batch or epoch level

src.pipeline.distillation.distillation.prepare_dataset(img_path: Path, student_model: Module, batch_size: int = 16, mode: str = 'train') → Tuple[YOLODataset, DataLoader][source]

Prepare dataset and dataloader for training.

Notes

number_of_objects_detected: the number of objects detected in all images in the batch batch_size: number of images in the batch

each batch in the train_dataloader contains:
batch_idx:
tensor of shape (number_of_objects_detected), for each object, the value is 0, … batch_size - 1, depending on the index of the image that the object belongs to in the batch
img: image tensor of shape (batch_size, 3, 640, 640)
bboxes: bboxes tensor of shape (number_of_objects_detected, 4), 4 is for normalized x1, y1, x2, y2
cls: cls tensor of shape (number_of_objects_detected, 1), containing all class labels of the objects detected in the batch
resized_shape: Resized 2D dim of the image. A list of tensor, first tensor is first dim, second tensor is second dim
ori_shape: Original 2D dim of the image. Alist of tensor, first tensor is first dim, second tensor is second dim

Parameters:

img_path – Directory containing images
student_model – Student model instance
batch_size – Batch size for training
mode – Dataset mode (“train” or “val”)

Returns:

Tuple of (dataset, dataloader)

src.pipeline.distillation.distillation.save_checkpoint(checkpoint_dir: Path, epoch: int, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, losses: Dict[str, float]) → None[source]

Save model checkpoint.

Parameters:

checkpoint_dir – Directory to save checkpoint
epoch – Current epoch number
student_model – Student model to save
optimizer – Optimizer state to save
learning_rate_scheduler – Learning rate scheduler state to save
losses – Dictionary of loss values

src.pipeline.distillation.distillation.save_final_model(model: Module, output_dir: Path, model_name: str = 'model.pt') → None[source]

Save the final model after training.

Parameters:

model – The model to save
output_dir – Directory to save the model
model_name – Name of the saved model file

src.pipeline.distillation.distillation.start_distillation(device: str = 'cpu', base_dir: Path = PosixPath('..'), img_dir: Path = PosixPath('dataset'), save_checkpoint_every: int = 25, frozen_layers: int = 10, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, resume_checkpoint: Path | None = None, output_dir: Path = PosixPath('distillation_out'), final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False, distillation_config: Dict[str, Any] | None = None, pipeline_config: Dict[str, Any] | None = None) → Dict[str, List[float]][source]

Start the distillation training process.

Parameters:

device – Device to train on
base_dir – Base directory for paths (should be SCRIPT_DIR from main.py)
img_dir – Directory containing training images
save_checkpoint_every – Save checkpoint every n epochs
frozen_layers – Number of layers to freeze in the backbone
hyperparams – Dictionary of hyperparameters for loss functions
resume_checkpoint – Optional path to checkpoint to resume training from
output_dir – Directory to save output
final_model_dir – Directory to save final model
log_level – Whether to log at batch or epoch level
debug – Whether to print debug information
distillation_config – Configuration dictionary for distillation
pipeline_config – Configuration dictionary for pipeline

Returns:

Dictionary containing lists of loss values for each epoch

src.pipeline.distillation.distillation.train_epoch(student_model: Module, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, optimizer: Optimizer, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str = 'cpu', nc: int = 5, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False) → Dict[str, float][source]: Train for one epoch.

src.pipeline.distillation.distillation.train_loop(num_epochs: int, student_model: Module, student_yolo: YOLO, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, detection_validator: DetectionValidator, optimizer: Optimizer, stopper: EarlyStopping, learning_rate_scheduler: LambdaLR, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str, checkpoint_dir: Path, save_checkpoint_every: int, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, start_epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'epoch', final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), debug: bool = False) → Dict[str, List[float]][source]

Execute the complete training process including all epochs.

Parameters:

num_epochs – Number of epochs to train
student_model – Student model to train
student_yolo – Student YOLO model instance for saving
teacher_model – Teacher model for distillation
train_dataloader – DataLoader for training data
detection_trainer – Detection trainer instance
optimizer – Optimizer for training
learning_rate_scheduler – Learning rate scheduler
detection_criterion – Detection loss criterion
config_dict – Configuration dictionary
device – Device to train on
checkpoint_dir – Directory to save checkpoints
save_checkpoint_every – Save checkpoint every n epochs
hyperparams – Dictionary of hyperparameters for loss functions
start_epoch – Start training from this epoch
log_file – Optional path to log file for metrics
log_level – Whether to log at batch or epoch level
final_model_dir – Directory to save final model
debug – Whether to print debug information

Returns:

Dictionary containing lists of loss values for each epoch