YOLOv8n + TensorRT (person class only) for Phase I detection
Context
Section titled “Context”The v1.9 plan committed to YOLOv8n person-class detection on the Jetson Orin Nano Super, compiled to TensorRT for inference latency. Andrew Kent (mentor since 2026-05-02) suggested in conversation that we “skip custom training in Phase I; start with off-shelf models” and Granola transcribed the suggested model family as “VLMs” (Vision-Language Models). This is a meaningfully different family from YOLO.
Two issues with switching:
- Granola transcription is known unreliable (e.g., spells
lbzfaiasLBCFin the same session). - VLMs on edge hardware in 2026 are not a drop-in replacement for YOLO on a Jetson Orin Nano Super — inference latency, memory footprint, and tooling maturity all differ.
Decision
Section titled “Decision”Stay on YOLOv8n + TensorRT (person class only) for Phase I. Defer any VLM evaluation to Phase II.
Why:
- Off-shelf COCO-pretrained YOLOv8n meets the Phase I requirement (detect a person in a workstation ROI to time cycle start/stop) without custom training, satisfying Andrew’s “off-shelf only” framing.
- Proven inference latency. YOLOv8n on Orin Nano Super has well-documented benchmarks and engine compilation pipelines.
- Bench-test plan dependency. The pre-Argentina bench tests (G1) assume YOLOv8n — switching now would invalidate the demo path.
Consequences
Section titled “Consequences”- Per-station fine-tuning deferred to Phase II. Phase I uses the off-shelf weights as-is. If detection quality is unacceptable for Pereira’s lighting/operator-position conditions during the July install, fine-tuning becomes a Phase I emergency task — not the current default.
- JetPack version pinning. Use JetPack 6.0 or 6.1, NOT 6.2. JetPack 6.2 ships TensorRT 10.x, which currently breaks INT8 quantization for YOLOv8 (per Ultralytics docs). 6.0/6.1 ships TRT 8.6 and matches working tutorials.
- Engine compiled on the Jetson itself. TensorRT engines are hardware-bound; cross-compilation from a MacBook will not run on the Jetson. Use
yolo export model=yolov8n.pt format=engine half=True device=0on the Jetson. - VLM revisit trigger. If Phase II behavioral monitoring (phone use, eating, talking, absence) needs richer scene understanding than person detection provides, re-open this decision.
References
Section titled “References”- Andrew Kent transition memo, 2026-05-02
- Granola transcript caveats noted in
docs/design/40-prototype/architecture.md§ “Post-v1.9 direction”