Architecture
High level
Section titled “High level” ┌────────────────────────────────────────┐ │ Angela module floor │ │ │ workstation 1 ──[PoE camera]──┐ │ workstation 2 ──[PoE camera]──┤ │ workstation 3 ──[PoE camera]──┼──[8-port PoE switch]──┐│ workstation 4 ──[PoE camera]──┤ ││ workstation 5 ──[PoE camera]──┤ ││ workstation 6 ──[PoE camera]──┘ ││ ││ └───────────────────────────────────────┼┘ │ (Cat6 uplink) ▼ ┌────────────────────────────────────────┐ │ Computer room │ │ │ │ [Main 24-port PoE switch]──────┐ │ │ │ │ │ [Jetson Orin Nano Super]◀──────┘ │ │ │ │ │ │ H.264 sub-stream │ │ │ GStreamer → NVDEC → OpenCV │ │ │ YOLOv8n + TensorRT (person) │ │ │ Event logging → SQLite │ │ │ │ │ ├──▶ Local dashboard (Flask :5000) │ │ │ ↳ via Tailscale from CA │ │ │ │ │ └──▶ Excel export (pandas/openpyxl)│ │ │ │ [4TB NVMe SSD] (rolling buffer) │ └────────────────────────────────────────┘Distributed switch topology — and why
Section titled “Distributed switch topology — and why”Naively, 6 cameras → 6 long Cat6 runs from each camera all the way to a central PoE switch in the computer room. At Phase II’s ~20-camera scale, that’s 20 runs and a lot of conduit work.
Instead: one 8-port PoE switch per production line (TP-Link TL-SG1008P, ~$30), local to that line, with a single Cat6 uplink to the main switch. At full Phase III scale that’s 4 long runs total instead of 20.
Stream profile
Section titled “Stream profile”Cameras stream H.264 over RTSP. The Jetson consumes the sub-stream and downscales to ~640×480 at 3–5 fps for inference (revised down from v1.9’s 15 fps per ADR-004 — sufficient for cycle-event detection at 10-90s cycles, collapses thermal and storage concerns). The Amcrest sub-stream presets are 704×480 / 352×240 / CIF / QCIF (640×480 itself is NOT a preset). The high-resolution main stream is recorded to disk for forensic playback only.
Decode pipeline (GStreamer + NVDEC)
Section titled “Decode pipeline (GStreamer + NVDEC)”rtspsrc location=rtsp://camera-N → rtph264depay → h264parse → nvv4l2decoder # NVDEC hardware path — no CPU decode → appsink # frames into OpenCVNVDEC keeps decode off the CPU/GPU compute path, leaving the Jetson’s 67 TOPS available for inference.
Design decisions worth preserving
Section titled “Design decisions worth preserving”Why YOLOv8 (and not background subtraction)
Section titled “Why YOLOv8 (and not background subtraction)”Original design considered background subtraction for cycle-time detection. Rejected because:
- YOLOv8 detects a person specifically — more accurate, more robust against lighting changes, fabric motion, machine vibration.
- Switching from background subtraction to YOLOv8 mid-project (for Phase II behavioral monitoring) would require rearchitecting the detection layer. Better to commit upfront.
Why central Jetson (and not per-camera AI)
Section titled “Why central Jetson (and not per-camera AI)”Considered: Ambarella CV72 SoC with VLM-at-camera. Rejected:
- Per-camera AI multiplies costs at 20-camera scale.
- Central compute supports multi-camera coordination in Phase II/III (shared context, hand-offs between stations).
- One Tailscale node is easier to remote-ops than 20.
Why Amcrest (and not Dahua or Hikvision)
Section titled “Why Amcrest (and not Dahua or Hikvision)”Per ADR-001, Phase I uses Amcrest IP8M-2779EW-AI varifocal turret cameras (2.7–13.5mm).
- Amcrest — current pick. The varifocal lens lets Ronald frame each workstation on-site without us pre-committing to a focal length we’d have to measure remotely. Built-in AI human/vehicle detection on the camera, Amazon-returnable, and Dahua-rebadge silicon so the underlying sensor stack is the same one v1.9 specced.
- Dahua IPC-HDW2849T-S-IL — original v1.9 pick (fixed lens). Replaced 2026-05-11 because Phase I dropped to 2 cameras and varifocal flexibility for on-site framing became the higher-value tradeoff. Dahua-from-Amazon is also gray-market risk; would have required Nelly’s Security or eBay sourcing.
- Hikvision DS-2CD2183G2-I — equivalent backup. Slightly more expensive; not adopted.
Security posture (per ADR-001): air-gap the camera VLAN, disable Amcrest Cloud/P2P, block outbound ports 37777 / 80 to mitigate CVE-2025-31700 and CVE-2020-5735.
Post-v1.9 direction (Andrew, 2026-05-02 onward)
Section titled “Post-v1.9 direction (Andrew, 2026-05-02 onward)”The v1.9 plan was locked 2026-04-27 — five days before the mentor transition. Andrew Kent’s May 2 direction proposed “skip custom training in Phase I; start with off-shelf models”, which is broadly compatible with v1.9’s YOLOv8-pretrained-on-COCO baseline. Granola transcribed the suggested model family as “VLMs” (Vision-Language Models), which is meaningfully different from YOLO — but Granola’s transcription has been unreliable (e.g., spelling lbzfai as LBCF), so this may be an audio-recognition artifact rather than an actual VLM proposal.
Treat the choice of off-shelf model as open until either:
- A v2.0 Drive plan is published, or
- The actual code starts importing a specific library
The architecture documented above (YOLOv8n + TensorRT) reflects v1.9 as the last formally-locked spec. See meetings.md for the divergences.