Distillation

src.pipeline.distillation.distillation.build_optimizer_and_scheduler(model: DetectionModel, detection_trainer: DetectionTrainer, model_args: Dict[str, Any]) Tuple[Optimizer, LambdaLR][source]

Build the optimizer and learning rate scheduler.

Parameters:
  • model – DetectionModel instance

  • detection_trainer – DetectionTrainer instance

  • model_args – Model arguments

Returns:

Tuple of optimizer and learning rate scheduler

src.pipeline.distillation.distillation.calculate_gradient_norm(model: Module) float[source]

Calculate the total gradient norm across all parameters.

Parameters:

model – The model to calculate gradient norm for

Returns:

Total gradient norm as a float

src.pipeline.distillation.distillation.freeze_layers(model: Module, num_layers: int = 10) None[source]

Freeze the first n layers of the model. For example, if num_layers = 10, the first 10 layers (The Backbone) will be frozen. https://community.ultralytics.com/t/guidance-on-freezing-layers-for-yolov8x-seg-transfer-learning/189/2 https://github.com/ultralytics/ultralytics/blob/3e669d53067ff1ed97e0dad0a4063b156f66686d/ultralytics/engine/trainer.py#L258

Parameters:
  • model – The model to freeze layers in

  • num_layers – Number of layers to freeze from the start

src.pipeline.distillation.distillation.head_features_decoder(head_feats: List[Tensor], nc: int, detection_criterion: v8DetectionLoss, reg_max: int = 16, strides: List[int] = [8, 16, 32], device: str = 'cpu') Tensor[source]

Decode the head features into bounding boxes and class scores.

Parameters:
  • head_feats – List of tensors, each representing a feature map from a detection head

  • nc – Number of classes

  • detection_criterion – Detection loss criterion

  • reg_max – Maximum number of bounding box parameters

  • strides – List of strides for the feature maps

  • device – Device to perform computations on

Returns:

pred_concatted: Concatenated bounding boxes and class raw logits scores

Shape is (batch_size, 4 + num_classes, total_predictions)

Return type:

Tensor

src.pipeline.distillation.distillation.load_checkpoint(checkpoint_path: Path, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, device: str = 'cpu') int[source]

Load a checkpoint and restore model and optimizer state.

Parameters:
  • checkpoint_path – Path to the checkpoint file

  • student_model – Student model to restore state to

  • optimizer – Optimizer to restore state to

  • learning_rate_scheduler – Learning rate scheduler to restore state to

Returns:

The epoch number from the checkpoint

src.pipeline.distillation.distillation.load_models(device: str, base_dir: Path, distillation_config: Dict[str, Any]) Tuple[YOLO, YOLO][source]

Load teacher and student models.

Parameters:
  • device – Device to load models on

  • base_dir – Base directory for model paths

  • distillation_config – Configuration dictionary for distillation

Returns:

Tuple of (teacher_yolo, student_yolo) models

src.pipeline.distillation.distillation.log_training_metrics(log_file: Path, epoch: int, batch_idx: int | None, losses: Dict[str, float], grad_norm_before: float | None = None, grad_norm_after: float | None = None, is_new_file: bool = False, log_level: Literal['batch', 'epoch'] = 'epoch') None[source]

Log training metrics to a CSV file.

Parameters:
  • log_file – Path to the log file

  • epoch – Current epoch number

  • batch_idx – Current batch index (None for epoch-level logging)

  • losses – Dictionary of loss values

  • grad_norm_before – Gradient norm before clipping

  • grad_norm_after – Gradient norm after clipping

  • is_new_file – Whether this is the first write to the file

  • log_level – Whether to log at batch or epoch level

src.pipeline.distillation.distillation.prepare_dataset(img_path: Path, student_model: Module, batch_size: int = 16, mode: str = 'train') Tuple[YOLODataset, DataLoader][source]

Prepare dataset and dataloader for training.

Notes

number_of_objects_detected: the number of objects detected in all images in the batch batch_size: number of images in the batch

  • each batch in the train_dataloader contains:

  • batch_idx:

    tensor of shape (number_of_objects_detected), for each object, the value is 0, … batch_size - 1, depending on the index of the image that the object belongs to in the batch

  • img: image tensor of shape (batch_size, 3, 640, 640)

  • bboxes: bboxes tensor of shape (number_of_objects_detected, 4), 4 is for normalized x1, y1, x2, y2

  • cls: cls tensor of shape (number_of_objects_detected, 1), containing all class labels of the objects detected in the batch

  • resized_shape: Resized 2D dim of the image. A list of tensor, first tensor is first dim, second tensor is second dim

  • ori_shape: Original 2D dim of the image. Alist of tensor, first tensor is first dim, second tensor is second dim

Parameters:
  • img_path – Directory containing images

  • student_model – Student model instance

  • batch_size – Batch size for training

  • mode – Dataset mode (“train” or “val”)

Returns:

Tuple of (dataset, dataloader)

src.pipeline.distillation.distillation.save_checkpoint(checkpoint_dir: Path, epoch: int, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, losses: Dict[str, float]) None[source]

Save model checkpoint.

Parameters:
  • checkpoint_dir – Directory to save checkpoint

  • epoch – Current epoch number

  • student_model – Student model to save

  • optimizer – Optimizer state to save

  • learning_rate_scheduler – Learning rate scheduler state to save

  • losses – Dictionary of loss values

src.pipeline.distillation.distillation.save_final_model(model: Module, output_dir: Path, model_name: str = 'model.pt') None[source]

Save the final model after training.

Parameters:
  • model – The model to save

  • output_dir – Directory to save the model

  • model_name – Name of the saved model file

src.pipeline.distillation.distillation.start_distillation(device: str = 'cpu', base_dir: Path = PosixPath('..'), img_dir: Path = PosixPath('dataset'), save_checkpoint_every: int = 25, frozen_layers: int = 10, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, resume_checkpoint: Path | None = None, output_dir: Path = PosixPath('distillation_out'), final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False, distillation_config: Dict[str, Any] | None = None, pipeline_config: Dict[str, Any] | None = None) Dict[str, List[float]][source]

Start the distillation training process.

Parameters:
  • device – Device to train on

  • base_dir – Base directory for paths (should be SCRIPT_DIR from main.py)

  • img_dir – Directory containing training images

  • save_checkpoint_every – Save checkpoint every n epochs

  • frozen_layers – Number of layers to freeze in the backbone

  • hyperparams – Dictionary of hyperparameters for loss functions

  • resume_checkpoint – Optional path to checkpoint to resume training from

  • output_dir – Directory to save output

  • final_model_dir – Directory to save final model

  • log_level – Whether to log at batch or epoch level

  • debug – Whether to print debug information

  • distillation_config – Configuration dictionary for distillation

  • pipeline_config – Configuration dictionary for pipeline

Returns:

Dictionary containing lists of loss values for each epoch

src.pipeline.distillation.distillation.train_epoch(student_model: Module, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, optimizer: Optimizer, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str = 'cpu', nc: int = 5, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False) Dict[str, float][source]

Train for one epoch.

src.pipeline.distillation.distillation.train_loop(num_epochs: int, student_model: Module, student_yolo: YOLO, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, detection_validator: DetectionValidator, optimizer: Optimizer, stopper: EarlyStopping, learning_rate_scheduler: LambdaLR, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str, checkpoint_dir: Path, save_checkpoint_every: int, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, start_epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'epoch', final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), debug: bool = False) Dict[str, List[float]][source]

Execute the complete training process including all epochs.

Parameters:
  • num_epochs – Number of epochs to train

  • student_model – Student model to train

  • student_yolo – Student YOLO model instance for saving

  • teacher_model – Teacher model for distillation

  • train_dataloader – DataLoader for training data

  • detection_trainer – Detection trainer instance

  • optimizer – Optimizer for training

  • learning_rate_scheduler – Learning rate scheduler

  • detection_criterion – Detection loss criterion

  • config_dict – Configuration dictionary

  • device – Device to train on

  • checkpoint_dir – Directory to save checkpoints

  • save_checkpoint_every – Save checkpoint every n epochs

  • hyperparams – Dictionary of hyperparameters for loss functions

  • start_epoch – Start training from this epoch

  • log_file – Optional path to log file for metrics

  • log_level – Whether to log at batch or epoch level

  • final_model_dir – Directory to save final model

  • debug – Whether to print debug information

Returns:

Dictionary containing lists of loss values for each epoch