Architecture

High level

                ┌────────────────────────────────────────┐
                │       Angela module floor              │
                │                                        │
  workstation 1 ──[PoE camera]──┐                        │
  workstation 2 ──[PoE camera]──┤                        │
  workstation 3 ──[PoE camera]──┼──[8-port PoE switch]──┐│
  workstation 4 ──[PoE camera]──┤                       ││
  workstation 5 ──[PoE camera]──┤                       ││
  workstation 6 ──[PoE camera]──┘                       ││
                                                        ││
                └───────────────────────────────────────┼┘
                                                        │ (Cat6 uplink)
                                                        ▼
                ┌────────────────────────────────────────┐
                │  Computer room                         │
                │                                        │
                │  [Main 24-port PoE switch]──────┐      │
                │                                 │      │
                │  [Jetson Orin Nano Super]◀──────┘      │
                │     │                                  │
                │     │ H.264 sub-stream                 │
                │     │ GStreamer → NVDEC → OpenCV       │
                │     │ YOLOv8n + TensorRT (person)      │
                │     │ Event logging → SQLite           │
                │     │                                  │
                │     ├──▶ Local dashboard (Flask :5000) │
                │     │      ↳ via Tailscale from CA     │
                │     │                                  │
                │     └──▶ Excel export (pandas/openpyxl)│
                │                                        │
                │  [4TB NVMe SSD] (rolling buffer)       │
                └────────────────────────────────────────┘

Distributed switch topology — and why

Naively, 6 cameras → 6 long Cat6 runs from each camera all the way to a central PoE switch in the computer room. At Phase II’s ~20-camera scale, that’s 20 runs and a lot of conduit work.

Instead: one 8-port PoE switch per production line (TP-Link TL-SG1008P, ~$30), local to that line, with a single Cat6 uplink to the main switch. At full Phase III scale that’s 4 long runs total instead of 20.

Stream profile

Cameras stream H.264 over RTSP. The Jetson consumes the sub-stream and downscales to ~640×480 at 3–5 fps for inference (revised down from v1.9’s 15 fps per ADR-004 — sufficient for cycle-event detection at 10-90s cycles, collapses thermal and storage concerns). The Amcrest sub-stream presets are 704×480 / 352×240 / CIF / QCIF (640×480 itself is NOT a preset). The high-resolution main stream is recorded to disk for forensic playback only.

Decode pipeline (GStreamer + NVDEC)

rtspsrc location=rtsp://camera-N
  → rtph264depay
  → h264parse
  → nvv4l2decoder            # NVDEC hardware path — no CPU decode
  → appsink                  # frames into OpenCV

NVDEC keeps decode off the CPU/GPU compute path, leaving the Jetson’s 67 TOPS available for inference.

Design decisions worth preserving

Why YOLOv8 (and not background subtraction)

Original design considered background subtraction for cycle-time detection. Rejected because:

YOLOv8 detects a person specifically — more accurate, more robust against lighting changes, fabric motion, machine vibration.
Switching from background subtraction to YOLOv8 mid-project (for Phase II behavioral monitoring) would require rearchitecting the detection layer. Better to commit upfront.

Why central Jetson (and not per-camera AI)

Considered: Ambarella CV72 SoC with VLM-at-camera. Rejected:

Per-camera AI multiplies costs at 20-camera scale.
Central compute supports multi-camera coordination in Phase II/III (shared context, hand-offs between stations).
One Tailscale node is easier to remote-ops than 20.

Why Amcrest (and not Dahua or Hikvision)

Per ADR-001, Phase I uses Amcrest IP8M-2779EW-AI varifocal turret cameras (2.7–13.5mm).

Amcrest — current pick. The varifocal lens lets Ronald frame each workstation on-site without us pre-committing to a focal length we’d have to measure remotely. Built-in AI human/vehicle detection on the camera, Amazon-returnable, and Dahua-rebadge silicon so the underlying sensor stack is the same one v1.9 specced.
Dahua IPC-HDW2849T-S-IL — original v1.9 pick (fixed lens). Replaced 2026-05-11 because Phase I dropped to 2 cameras and varifocal flexibility for on-site framing became the higher-value tradeoff. Dahua-from-Amazon is also gray-market risk; would have required Nelly’s Security or eBay sourcing.
Hikvision DS-2CD2183G2-I — equivalent backup. Slightly more expensive; not adopted.

Security posture (per ADR-001): air-gap the camera VLAN, disable Amcrest Cloud/P2P, block outbound ports 37777 / 80 to mitigate CVE-2025-31700 and CVE-2020-5735.

Post-v1.9 direction (Andrew, 2026-05-02 onward)

The v1.9 plan was locked 2026-04-27 — five days before the mentor transition. Andrew Kent’s May 2 direction proposed “skip custom training in Phase I; start with off-shelf models”, which is broadly compatible with v1.9’s YOLOv8-pretrained-on-COCO baseline. Granola transcribed the suggested model family as “VLMs” (Vision-Language Models), which is meaningfully different from YOLO — but Granola’s transcription has been unreliable (e.g., spelling lbzfai as LBCF), so this may be an audio-recognition artifact rather than an actual VLM proposal.

Treat the choice of off-shelf model as open until either:

A v2.0 Drive plan is published, or
The actual code starts importing a specific library

The architecture documented above (YOLOv8n + TensorRT) reflects v1.9 as the last formally-locked spec. See meetings.md for the divergences.