Choose models based on your task category.
- Image classification: ResNet, EfficientNetV2 — assign a single label to an image.
- Object detection: YOLOX, RTMDet — locate and classify multiple objects.
- Segmentation: DeepLabv3+, TopFormer — assign a class to each pixel.
- Vision-Language Models (VLMs): OWL-ViT, LLaVA OneVision — enable multimodal reasoning and can be applied across classification, detection, and segmentation tasks.
The model garden provides curated recommendations per task, including compute and memory requirements to help match your silicon budget. VLMs are more resource-intensive, but offer strong generalization and often work without task-specific training.