Distillation
- src.pipeline.distillation.distillation.build_optimizer_and_scheduler(model: DetectionModel, detection_trainer: DetectionTrainer, model_args: Dict[str, Any]) Tuple[Optimizer, LambdaLR] [source]
Build the optimizer and learning rate scheduler.
- Parameters:
model – DetectionModel instance
detection_trainer – DetectionTrainer instance
model_args – Model arguments
- Returns:
Tuple of optimizer and learning rate scheduler
- src.pipeline.distillation.distillation.calculate_gradient_norm(model: Module) float [source]
Calculate the total gradient norm across all parameters.
- Parameters:
model – The model to calculate gradient norm for
- Returns:
Total gradient norm as a float
- src.pipeline.distillation.distillation.freeze_layers(model: Module, num_layers: int = 10) None [source]
Freeze the first n layers of the model. For example, if num_layers = 10, the first 10 layers (The Backbone) will be frozen. https://community.ultralytics.com/t/guidance-on-freezing-layers-for-yolov8x-seg-transfer-learning/189/2 https://github.com/ultralytics/ultralytics/blob/3e669d53067ff1ed97e0dad0a4063b156f66686d/ultralytics/engine/trainer.py#L258
- Parameters:
model – The model to freeze layers in
num_layers – Number of layers to freeze from the start
- src.pipeline.distillation.distillation.head_features_decoder(head_feats: List[Tensor], nc: int, detection_criterion: v8DetectionLoss, reg_max: int = 16, strides: List[int] = [8, 16, 32], device: str = 'cpu') Tensor [source]
Decode the head features into bounding boxes and class scores.
- Parameters:
head_feats – List of tensors, each representing a feature map from a detection head
nc – Number of classes
detection_criterion – Detection loss criterion
reg_max – Maximum number of bounding box parameters
strides – List of strides for the feature maps
device – Device to perform computations on
- Returns:
- pred_concatted: Concatenated bounding boxes and class raw logits scores
Shape is (batch_size, 4 + num_classes, total_predictions)
- Return type:
Tensor
- src.pipeline.distillation.distillation.load_checkpoint(checkpoint_path: Path, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, device: str = 'cpu') int [source]
Load a checkpoint and restore model and optimizer state.
- Parameters:
checkpoint_path – Path to the checkpoint file
student_model – Student model to restore state to
optimizer – Optimizer to restore state to
learning_rate_scheduler – Learning rate scheduler to restore state to
- Returns:
The epoch number from the checkpoint
- src.pipeline.distillation.distillation.load_models(device: str, base_dir: Path, distillation_config: Dict[str, Any]) Tuple[YOLO, YOLO] [source]
Load teacher and student models.
- Parameters:
device – Device to load models on
base_dir – Base directory for model paths
distillation_config – Configuration dictionary for distillation
- Returns:
Tuple of (teacher_yolo, student_yolo) models
- src.pipeline.distillation.distillation.log_training_metrics(log_file: Path, epoch: int, batch_idx: int | None, losses: Dict[str, float], grad_norm_before: float | None = None, grad_norm_after: float | None = None, is_new_file: bool = False, log_level: Literal['batch', 'epoch'] = 'epoch') None [source]
Log training metrics to a CSV file.
- Parameters:
log_file – Path to the log file
epoch – Current epoch number
batch_idx – Current batch index (None for epoch-level logging)
losses – Dictionary of loss values
grad_norm_before – Gradient norm before clipping
grad_norm_after – Gradient norm after clipping
is_new_file – Whether this is the first write to the file
log_level – Whether to log at batch or epoch level
- src.pipeline.distillation.distillation.prepare_dataset(img_path: Path, student_model: Module, batch_size: int = 16, mode: str = 'train') Tuple[YOLODataset, DataLoader] [source]
Prepare dataset and dataloader for training.
Notes
number_of_objects_detected: the number of objects detected in all images in the batch batch_size: number of images in the batch
each batch in the train_dataloader contains:
- batch_idx:
tensor of shape (number_of_objects_detected), for each object, the value is 0, … batch_size - 1, depending on the index of the image that the object belongs to in the batch
img: image tensor of shape (batch_size, 3, 640, 640)
bboxes: bboxes tensor of shape (number_of_objects_detected, 4), 4 is for normalized x1, y1, x2, y2
cls: cls tensor of shape (number_of_objects_detected, 1), containing all class labels of the objects detected in the batch
resized_shape: Resized 2D dim of the image. A list of tensor, first tensor is first dim, second tensor is second dim
ori_shape: Original 2D dim of the image. Alist of tensor, first tensor is first dim, second tensor is second dim
- Parameters:
img_path – Directory containing images
student_model – Student model instance
batch_size – Batch size for training
mode – Dataset mode (“train” or “val”)
- Returns:
Tuple of (dataset, dataloader)
- src.pipeline.distillation.distillation.save_checkpoint(checkpoint_dir: Path, epoch: int, student_model: Module, optimizer: Optimizer, learning_rate_scheduler: LambdaLR, losses: Dict[str, float]) None [source]
Save model checkpoint.
- Parameters:
checkpoint_dir – Directory to save checkpoint
epoch – Current epoch number
student_model – Student model to save
optimizer – Optimizer state to save
learning_rate_scheduler – Learning rate scheduler state to save
losses – Dictionary of loss values
- src.pipeline.distillation.distillation.save_final_model(model: Module, output_dir: Path, model_name: str = 'model.pt') None [source]
Save the final model after training.
- Parameters:
model – The model to save
output_dir – Directory to save the model
model_name – Name of the saved model file
- src.pipeline.distillation.distillation.start_distillation(device: str = 'cpu', base_dir: Path = PosixPath('..'), img_dir: Path = PosixPath('dataset'), save_checkpoint_every: int = 25, frozen_layers: int = 10, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, resume_checkpoint: Path | None = None, output_dir: Path = PosixPath('distillation_out'), final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False, distillation_config: Dict[str, Any] | None = None, pipeline_config: Dict[str, Any] | None = None) Dict[str, List[float]] [source]
Start the distillation training process.
- Parameters:
device – Device to train on
base_dir – Base directory for paths (should be SCRIPT_DIR from main.py)
img_dir – Directory containing training images
save_checkpoint_every – Save checkpoint every n epochs
frozen_layers – Number of layers to freeze in the backbone
hyperparams – Dictionary of hyperparameters for loss functions
resume_checkpoint – Optional path to checkpoint to resume training from
output_dir – Directory to save output
final_model_dir – Directory to save final model
log_level – Whether to log at batch or epoch level
debug – Whether to print debug information
distillation_config – Configuration dictionary for distillation
pipeline_config – Configuration dictionary for pipeline
- Returns:
Dictionary containing lists of loss values for each epoch
- src.pipeline.distillation.distillation.train_epoch(student_model: Module, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, optimizer: Optimizer, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str = 'cpu', nc: int = 5, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'batch', debug: bool = False) Dict[str, float] [source]
Train for one epoch.
- src.pipeline.distillation.distillation.train_loop(num_epochs: int, student_model: Module, student_yolo: YOLO, teacher_model: Module, train_dataloader: DataLoader, detection_trainer: DetectionTrainer, detection_validator: DetectionValidator, optimizer: Optimizer, stopper: EarlyStopping, learning_rate_scheduler: LambdaLR, detection_criterion: v8DetectionLoss, config_dict: Dict[str, Any], device: str, checkpoint_dir: Path, save_checkpoint_every: int, hyperparams: Dict[str, float] = {'lambda_detection': 1.0, 'lambda_dist_ciou': 1.0, 'lambda_dist_kl': 2.0, 'lambda_distillation': 2.0}, start_epoch: int = 1, log_file: Path | None = None, log_level: Literal['batch', 'epoch'] = 'epoch', final_model_dir: Path = PosixPath('automl_workspace/model_registry/distilled/latest'), debug: bool = False) Dict[str, List[float]] [source]
Execute the complete training process including all epochs.
- Parameters:
num_epochs – Number of epochs to train
student_model – Student model to train
student_yolo – Student YOLO model instance for saving
teacher_model – Teacher model for distillation
train_dataloader – DataLoader for training data
detection_trainer – Detection trainer instance
optimizer – Optimizer for training
learning_rate_scheduler – Learning rate scheduler
detection_criterion – Detection loss criterion
config_dict – Configuration dictionary
device – Device to train on
checkpoint_dir – Directory to save checkpoints
save_checkpoint_every – Save checkpoint every n epochs
hyperparams – Dictionary of hyperparameters for loss functions
start_epoch – Start training from this epoch
log_file – Optional path to log file for metrics
log_level – Whether to log at batch or epoch level
final_model_dir – Directory to save final model
debug – Whether to print debug information
- Returns:
Dictionary containing lists of loss values for each epoch