Setup & Installation
What This Skill Does
Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub. It handles bbox format detection, string category remapping, dataset validation, Trackio monitoring, and Hub persistence in one workflow, eliminating the manual infrastructure work that typically precedes each training run.
When to use it
- Working with hugging face vision trainer functionality
- Implementing hugging face vision trainer features
- Debugging hugging face vision trainer related issues
