Cycle-event detection — spec
Bucket: technical/ml (Agent D) · Status: reviewed (Phase B seams applied: 3–5 fps per ADR-004; CV writer commits to direct SQLite) · Owner: Sophia Mann · Phase: I · Last updated: 2026-05-12
Context
Section titled “Context”Phase I’s product is a stream of cycle-time events in the cycle_events SQLite table (one row per garment-unit-cycle per workstation), feeding the dashboard and the Excel export that mirrors INDICADORES ABRIL.xlsx. The cycle event is the only thing the CV pipeline produces in Phase I — everything else (efficiency %, bottleneck flagging, behavioral codes) is downstream math.
A “cycle” at LBZF means one unit of garment work at a workstation — e.g., one collar topstitched at the PESPUNTAR CUELLO station. The Angela module’s reference garment (Ref22 Slim, 24.16 min total SAM, 21 workstations, target 49 units/hr) defines per-station Standard Allowed Minutes (SAMs) in the Balanceo tab of Ref22 Slim - Angela.xlsx. Per-cycle durations therefore range roughly 20 seconds to 4 minutes across stations (24.16 min / 21 ops ≈ 1.15 min mean; the slowest assembly steps run ~3–4 min, the fastest hem/press ops run well under a minute).
The v1.9 spec says: YOLOv8n + TensorRT, person class only (COCO class 0), rtspsrc → nvv4l2decoder → appsink → OpenCV. This doc defines what the inference loop does with the detections it gets — specifically how the start and end of a cycle are decided, and how the system avoids counting noise as cycles.
Naive approach: person enters station ROI → cycle starts; person leaves ROI → cycle ends. This is wrong in at least seven ways (operator leans back, talks to neighbor, supervisor enters frame, machine runs unattended, partial occlusion, hi-vis vest, lighting). Below specifies the actual algorithm.
- G1: Generate one
cycle_eventsrow per real garment unit produced at the workstation, withstart_ts,end_ts,duration_seconds, and a confidence tag. - G2: Per-cycle duration agreement with Ronald’s stopwatch: target Lin’s CCC ≥ 0.85 and Bland-Altman 95% LoA within ±10 s for cycles with mean duration ≤ 90 s, ±15 s for longer cycles. (Calibrated to validation doc.)
- G3: False-cycle rate (events the operator/Ronald disagrees were a cycle): ≤ 5% over the validation window.
- G4: Missed-cycle rate (real cycles the system did not log): ≤ 5%.
- G5: End-to-end latency from frame capture to event write ≤ 2 s p95 (Phase I is not real-time-critical, but anything over a few seconds breaks the live dashboard’s illusion).
- G6: Inference budget on the Orin Nano: ≤ 10 ms/frame YOLOv8n FP16 TensorRT per camera, with massive headroom at the 2 cameras × 3–5 fps Phase I load (≤ 10 frames/s aggregate, per ADR-004). Public benchmarks show YOLOv8n at 640×480 FP16 TensorRT on Orin Nano around 7.5–10 ms per frame [verified, see literature review].
Non-goals
Section titled “Non-goals”- Garment-type classification (Phase II).
- Per-operation labeling within a workstation (Phase III if station performs multiple ops).
- Quality / defect detection of the produced garment.
- Anything pose-, skeleton-, or action-segmentation-based. Phase I uses bounding boxes only.
Proposed approach
Section titled “Proposed approach”1. Detection layer
Section titled “1. Detection layer”- Model:
yolov8n.pt, Ultralytics, pinned to a specific commit / release tag (e.g., Ultralytics v8.2.x — OPEN[Sophia, pre-Pereira]: pin exact tag before flight to Pereira). - Export:
yolo export model=yolov8n.pt format=engine half=True imgsz=640 device=0to produceyolov8n.engineon the actual Jetson Orin Nano Super (TensorRT engines are not portable across GPU + JetPack versions; this must be built on-device or on an identical Jetson). - Classes: only
person(class 0). All other class indexes filtered out before the post-processing layer ever sees them. - Per-frame inference:
- conf threshold = 0.35 (Ultralytics default 0.25 is too liberal for industrial scenes with mannequins, posters, photos on walls).
- iou (NMS) threshold = 0.5.
- max_det = 8 per frame (factory floor; rarely more than 3–4 people legitimately in any one camera ROI at once, headroom for supervisor / mechanic).
- imgsz = 640 (the model’s native train size; 640×480 input is letterboxed to 640×640 — see “Weak” section).
- Output for each frame: list of
(x1, y1, x2, y2, conf)tuples for thepersonclass.
2. Workstation-association layer
Section titled “2. Workstation-association layer”For each camera there is a set of ROI polygons — one polygon per workstation that camera sees (typically 1, occasionally 2; see roi-calibration.md). For each detected person bbox we compute:
overlap_ratio = area(bbox ∩ roi_polygon) / area(bbox)A bbox is associated with a workstation iff overlap_ratio ≥ 0.5 and the bbox center is inside the polygon. If multiple workstations claim the bbox we take the one with highest overlap.
3. Per-workstation occupancy signal
Section titled “3. Per-workstation occupancy signal”Per camera frame, per workstation: occupied(t) ∈ {0, 1} = 1 iff at least one associated person bbox exists. (Multiple people at the same station do not split a cycle — supervisor walking by is the canonical case, see §5.)
4. State machine for cycle detection
Section titled “4. State machine for cycle detection”States: IDLE, OCCUPIED, PAUSED.
IDLE → OCCUPIED: when occupied(t) == 1 for ≥ ENTER_HYST consecutive frames → tentative start_ts = t - (ENTER_HYST - 1) * frame_period
OCCUPIED → PAUSED: when occupied(t) == 0 for ≥ EXIT_HYST consecutive frames → tentative end_candidate = t - (EXIT_HYST - 1) * frame_period
PAUSED → OCCUPIED: when occupied(t) == 1 within REENTRY_WINDOW after entering PAUSED → cancel end_candidate; remain in the same cycle (this is the "operator turned to talk to neighbor / went to bobbin shelf" case)
PAUSED → IDLE: when PAUSED persists ≥ REENTRY_WINDOW → finalize cycle with end_ts = end_candidate; emit event if cycle valid (see §6)Default parameters (per-workstation overridable, persisted in the rois table per roi-calibration.md):
| Param | Default | Rationale |
|---|---|---|
ENTER_HYST | 2 frames (≈ 400–667 ms at 3–5 fps) | Robust against single-frame false positives (model flicker, motion blur). At the lower 3 fps tier, 2 frames is still sub-second. |
EXIT_HYST | 4 frames (≈ 800 ms – 1.3 s at 3–5 fps) | Operator brief lean-back / reach-for-spool should not end a cycle. |
REENTRY_WINDOW | 8 s | Operator can turn fully to neighbor / step to bobbin shelf and return. Calibrated against operation videos. OPEN[Sophia, training set]: empirically tune from the 41 videos. |
MIN_CYCLE_DURATION | 8 s | Cycles shorter than this are noise (someone walking through, supervisor inspecting). Below the fastest legitimate Angela SAM per Ref22 Slim - Angela.xlsx. |
MAX_CYCLE_DURATION | 3 × SAM_station | Anything longer than 3× the standard time gets flagged as garbage (operator absent but machine running, system stuck OCCUPIED). |
5. Multi-person disambiguation
Section titled “5. Multi-person disambiguation”The “supervisor checking work” failure mode is common: Ronald or a line supervisor walks up to a station, leans in, looks at the garment, walks away. Naive code starts a new cycle.
Mitigation, in order of cost:
- Same-station rule: a station that is already
OCCUPIEDcannot be re-started by an additional person entering. The cycle ends only when all associated bboxes leave. - Operator-track persistence (lightweight): assign each bbox a short-lived track ID using IoU-based matching frame-to-frame (no need for full ByteTrack/DeepSORT in Phase I — IoU > 0.3 across consecutive frames is enough at 3–5 fps; tracks just need to survive the gap between inference frames). The “primary operator” of a cycle = the track with the longest dwell time inside the ROI. Cycle ends when the primary operator leaves, even if a supervisor remains briefly. (This is a stretch goal — Phase I can ship with rule 1 alone.)
- Phase II add-on: hi-vis vest classifier or a face-ID-free “operator embedding” so a supervisor in a different uniform is recognizable as not-the-operator. Out of Phase I scope.
6. Cycle validity filter (write or drop)
Section titled “6. Cycle validity filter (write or drop)”A cycle is written to cycle_events iff:
MIN_CYCLE_DURATION ≤ duration_seconds ≤ MAX_CYCLE_DURATION, and- the cycle’s primary-operator track has dwell ratio ≥ 0.6 (was present for at least 60% of
duration_seconds).
Otherwise the cycle is written to a parallel cycle_events_rejected table with a reason enum (too_short, too_long, low_dwell, confidence_drop). This table is critical for the paper — it is the only audit trail of detector behavior and the substrate for the failure-mode section.
7. Confidence tagging
Section titled “7. Confidence tagging”Each cycle event carries a quality score in {green, yellow, red}:
- green — primary-operator dwell ≥ 0.85, mean per-frame max-person conf ≥ 0.6, no
PAUSEDexcursions, duration within ±2σ of station’s running mean. - yellow — one of the above degraded (still emitted, dashboard styles it).
- red — multiple criteria degraded; written but flagged for human spot-check. Red rate over a sliding window is one of the failure-monitor metrics (see
failure-modes-and-monitoring.md).
8. The “operator absent, machine running” case
Section titled “8. The “operator absent, machine running” case”There is no Phase-I CV signal for “the operator’s foot is on the pedal but the operator’s torso is not in the ROI” because Phase I has no machine-on signal at all. Two mitigations:
MAX_CYCLE_DURATION = 3 × SAMcatches a stuck cycle.- Phase II add: optional machine-on signal (audio classifier from camera audio if the Amcrest mic is enabled, or current-clamp sensor on the sewing machine motor). Out of Phase I scope.
9. The “operator working but occluded” case
Section titled “9. The “operator working but occluded” case”If the operator leans forward over the fabric, much of the torso may go below the bbox-detector’s confidence threshold. Mitigations:
- ROI polygons are deliberately drawn around the seated chest+head region (not the full body) so even a leaned-over operator’s head/upper-back keeps the bbox > 0.5 overlap. Coordinated with
roi-calibration.md. EXIT_HYST = 1 s+REENTRY_WINDOW = 8 sare jointly tolerant of occlusion of up to 8 s.- For workstations with chronic occlusion (operator’s body geometry occludes head at the press / planchadora),
EXIT_HYSTraised to 30 frames (2 s) per-station via the override table.
Alternatives considered
Section titled “Alternatives considered”| Alt | Pros | Cons | Why rejected |
|---|---|---|---|
| Background subtraction (OpenCV MOG2) | Cheap; no model dependency. | Sensitive to lighting changes (dusk shift), fabric motion, machine vibration. Cannot distinguish operator from supervisor. | Person-class detection is more robust and forward-compatible with Phase II behavioral models. Documented in docs/technical/architecture.md. |
| YOLOv6 / YOLOv7 | Comparable accuracy; some variants faster on edge. | Ultralytics + YOLOv8 has the best Python ergonomics, TensorRT export path, and community Jetson recipes. YOLOv8n latency on Orin Nano is already well under budget. | Ergonomic / momentum reasons; not a performance call. |
| YOLOv8s or YOLOv8m | More accurate, better small-object recall. | 2–4× the latency. Even at the Phase I 2 cameras × 3–5 fps load the Orin Nano can absorb it, but the n-tier already meets recall targets and leaves capacity for Phase II behavioral models. | Phase I task is person detection in mostly-clean ROIs at a known scale — YOLOv8n is sufficient. Re-evaluate for Phase II. |
| Optical-flow-based motion detection | Could detect hand motion → “actively working” signal. | Costly per pixel; flow on garment fabric is itself noisy (the fabric moves); doesn’t disambiguate operator from supervisor. | Defer to Phase II as a possible “active work” overlay signal. |
| Temporal action segmentation (I3D / SPOT / temporal-segment networks) | State-of-the-art for assembly-line action recognition [Rashid 2024, Ghoddoosian 2023 — needs-lit-review]. | Requires labeled action segments; far more compute than Orin Nano provides; introduces a second model. | Phase II direction. Phase I cycle = ROI dwell, not action class. |
| Vision-Language Models (Andrew/Granola 5/2 transcript) | Could in principle zero-shot detect richer behaviors. | Costly per-frame; latency on Orin Nano hostile to real-time; the Granola “VLM” transcription is itself suspect (see docs/references/meetings.md). | Stay on YOLOv8 baseline until either a v2.0 plan or actual code change confirms a VLM direction. |
| Pose-based cycle detection (HRNet keypoints) | Lets you decide “operator is seated and facing machine” precisely. | Heavier model; pose at 3–5 fps × 2 cameras fits Phase I budget, but the value of fine pose info on cycle counting is marginal vs the implementation cost. | Phase II direction for behavioral monitoring; overkill for Phase I cycle counting. |
Open questions
Section titled “Open questions”- OPEN[Sophia, by 2026-06-15]: Pin the exact Ultralytics release tag and commit SHA we ship to Pereira. (Version churn between v8.0 and v8.3 has changed default args.)
- OPEN[Ronald via Armando, by 2026-06-01]: Confirm typical and worst-case occlusion patterns at each Angela workstation. Specifically: at
PLANCHAR(press), does the operator’s full upper body stay in frame from the planned camera angle? If not, overrideEXIT_HYST. - OPEN[Andrew, by 2026-05-20]: Is the “primary operator track” (§5 rule 2) worth shipping in Phase I, or is the same-station rule (§5 rule 1) enough? Andrew’s Form-AI experience is directly relevant — they likely have a default here.
- OPEN[ITBA, by 2026-06-15]: Do the ITBA install conditions (different garment line) need different
ENTER_HYST/EXIT_HYST? They should run the same algorithm with their own per-workstation thresholds, not a divergent algorithm. - OPEN[Sophia, before Argentina trip 2026-05-15]: Should the cycle-event schema include a
model_versionandroi_versionforeign key from day one? (Answer should be: yes — seereproducibility-and-artifacts.md. Confirm with Agent B before they finalize the schema.) - OPEN[Ronald, by 2026-06-15]: Are there workstations where the operator legitimately leaves the station for >8 s during a cycle (e.g., walks to a fabric-cart 2 m away to grab the next bundle and comes back)? If yes,
REENTRY_WINDOWis too short and we’ll under-count. - OPEN[Sophia, paper deadline]: Do we report
cycle_events_rejectedrates in the paper? Recommendation: yes, with breakdown by reason — it’s the only honest measure of how much the algorithm throws away.
CV writer → SQLite (direct write, no HTTP)
Section titled “CV writer → SQLite (direct write, no HTTP)”Canonical Phase I boundary (decided 2026-05-12): the CV writer process opens its own SQLite connection (/data/lbzf.db, WAL mode) and writes cycle_events, cycle_events_rejected, and manual_observations rows directly. There is no /api/v1/internal/cycles endpoint; the writer is not an HTTP client of the dashboard.
Why direct SQLite (not an internal HTTP endpoint):
- One process writes (
lbzf-cv-writer.service); the dashboard process and the exporter open the DB read-only. - SQLite WAL allows concurrent readers without blocking the writer; the dashboard’s reads do not interfere with the inference hot loop.
- An HTTP hop between the writer and the DB would add latency and a failure mode (dashboard down → writer can’t persist) for no architectural benefit at Phase I scale.
Coordination:
- Dashboard process applies migrations at startup. Writer waits for
schema_version.version >= REQUIRED_MINbefore opening writes. If migrations are not applied within 30 s, the writer alerts and exits. - Both processes set
PRAGMA busy_timeout=5000. - Writer is the only writer (one row inserted per cycle-end transition, one update on commit). All other tables (
workstations,standard_times,operators) are seeded once and read-only at runtime; admin edits go through the dashboard API and are serialized via aWriteLockin-process mutex on the dashboard side (the dashboard process opens a read-write connection only for admin writes).
The HTTP-internal-API alternative is preserved in 60-parking/websocket-live-updates.md’s neighborhood (see backend/dashboard-api.md for the rejection note).
Cross-bucket dependencies
Section titled “Cross-bucket dependencies”- Agent A (frontend):
cycle_events.qualityfield must surface in the dashboard (green/yellow/red station tiles). Confirms with Agent A’s dashboard spec. - Agent B (backend):
data-model.mdnow includescycle_events.quality,cycle_events.primary_track_dwell_ratio,cycle_events.mean_conf,cycle_events.roi_version_id(FK), plus therois,cycle_events_rejected, andmanual_observationstables. The CV writer readsroisto pickroi_version_idand writes the other tables directly. - Agent C (hardware): camera mount must keep the operator’s seated head+chest in frame; the Amcrest dome’s varifocal lens is wide-angle but ROI shrinks at oblique angles. Camera height ~2.5 m, tilted ~30° down from horizontal is the working assumption. Confirm with on-site measurements.
- Agent E (business/legal): the
cycle_events_rejectedtable and the per-frame confidence logs are personal-data adjacent under Colombian Law 1581/2012 because they’re linked to a workstation that is linked to a named operator via the Angela roster. Retention policy must be defined before deployment.
What’s weak in this doc
Section titled “What’s weak in this doc”MIN_CYCLE_DURATION = 8 sis asserted, not derived. It is below the fastest Angela SAM in the spreadsheet, but it is not derived from a distribution of actual cycle durations observed in Ronald’s 41 videos. A reviewer will ask “where’s the histogram?” — and rightly. The honest answer is “we’ll calibrate it during the training-set construction sprint in June 2026 and update this doc.” Until then, the 8 s number is a placeholder.- The “primary operator track” mechanism is hand-waved. “IoU > 0.3 across consecutive frames” gets you most of the way, but the well-known failure case is when the supervisor and the operator’s bboxes overlap heavily (supervisor leaning behind operator). This doc does not specify what happens then. A real implementation needs at least a track-length tiebreaker, possibly an appearance embedding. Currently spec’d as “stretch goal” — a reviewer will read that as “didn’t think it through.”
- 640×480 is asserted as sufficient without a hand-motion experiment. Sewing operations involve fine motor movement, and Phase II’s behavioral cases (phone-to-ear, fork-to-mouth) live at exactly the resolution scale 640×480 will smear. We have not experimentally verified that 1080p is unnecessary on Orin Nano. The latency budget (Orin Nano at 1080p sub-stream × 6 cameras) needs a real benchmark, not a thought-experiment. Logged in
phase-ii-preview.mdbut the resolution decision for Phase I is also weak. - The state machine does not handle camera disconnects gracefully. If a camera drops out mid-cycle and reconnects 30 s later, the current spec will sit in
OCCUPIEDuntilMAX_CYCLE_DURATION, then emit a garbage cycle. A real implementation needs a “stream stale → reset state machine, drop in-flight cycle” rule, and that rule needs to be coordinated with the GStreamer pipeline’s reconnection behavior (Agent C). - No principled story for variable
SAM_station-dependent thresholds.MAX_CYCLE_DURATION = 3 × SAMis round-numbered. The literature ([Rashid 2024 — needs-lit-review]) suggests operator-time distributions in apparel manual work are heavy-tailed (lognormal), and a fixed multiplier under-rejects on short SAMs and over-rejects on long ones. A percentile-based threshold (e.g., 99th-percentile-of-fitted-lognormal) is more defensible but requires us to fit the distribution first — chicken-and-egg vs the deployment.
Rollout
Section titled “Rollout”| Date | Gate |
|---|---|
| 2026-05-15 | Algorithm spec frozen for Argentina handoff — ITBA must be able to run the same state machine on their twin hardware. |
| 2026-06-01 | First end-to-end run against one Angela operation video locally on the M1 (CPU PyTorch, no TensorRT) — produces a valid cycle_events row. |
| 2026-06-15 | First end-to-end run on the Orin Nano against a recorded Pereira clip — meets latency budget. |
| 2026-07-01 (Pereira) | Day-1 on-site: ROI calibration (see roi-calibration.md), thresholds defaulted, one shift recorded for offline validation. |
| 2026-07-02 → 07-15 | Validation window vs Ronald stopwatch (see validation-methodology.md). Adjust thresholds. |
| 2026-08-01 | Phase I “stable” — kill switch: if false-cycle + missed-cycle rate exceeds 15% over any 4-hour window, the dashboard auto-falls-back to “manual count required” and the system writes only to cycle_events_rejected. |
Paper alignment
Section titled “Paper alignment”- Methods: state machine, hysteresis values, validity filter — Section 3.2 (“Cycle Detection”) of the paper.
- Experimental setup: hardware + model + thresholds table, one row per parameter — Section 4.1.
- Results: per-station
cycle_eventscount vs ground truth (Table 2), false-cycle and missed-cycle rate over the validation window (Table 3). - Limitations: the multi-person disambiguation, the resolution choice, the camera-disconnect handling — discussed in Section 6.
- Figures: state-machine diagram (Fig. 3), example timeline of
occupied(t)with annotated state transitions (Fig. 4), confusion-matrix-equivalent vs stopwatch (Fig. 5).