Data model — SQLite on the Jetson

Bucket: Backend (Agent B) Status: Reviewed — 2026-05-12 (Phase B seams applied: ML schema fields added; audit_log parked) Supersedes / refines: docs/technical/software.md §“Data model (SQLite)” Owner: Sophia · Reviewers: Andrew, Ronald, Agent D (CV/ML), Agent E (business/privacy)

Context

The v1.9 plan locks SQLite on the Jetson as the source-of-truth event store for Phase I, with three tables: workstations, cycle_events, standard_times. docs/technical/software.md sketches a minimal DDL. This doc finishes the spec — full columns, indexes, FKs, enums, migrations, retention, and backup — so it is buildable without further interpretation.

The decision being made: what is the exact shape of the source-of-truth tables, what gets retained vs rolled, and how do we keep this data alive when the Jetson SD card dies or the SQLite file corrupts.

Goals

A single canonical DDL that the dashboard, exporter, and CV writer all read against
Schema that handles multi-module from day 1 (no forklift between Phase I → II → III)
Explicit retention policy for cycle_events, raw frames, and recorded video
Backup / replication story that survives a Jetson SD-card or NVMe failure
IEEE-paper-grade metadata captured on every cycle_events row (CV model version, confidence, frame path)
Drop-in mapping to Ronald’s INDICADORES ABRIL.xlsx and Ref22 Slim - Angela.xlsx column vocabulary

Non-goals

A general-purpose ERP schema (orders, billing, finished-goods inventory). Out of scope.
Multi-tenant org tables on the Jetson — those live in the cloud user DB (see user-org-and-auth.md).
Behavioral / phone-use / pause events — Phase II concern; the schema reserves space but does not implement detection.
Real-time frame streaming. SQLite stores events, not pixels; frames go to disk and roll independently.

Proposed approach

File layout on the Jetson

/data/                         # 4TB NVMe (ext4), mounted from /etc/fstab
  lbzf.db                      # primary SQLite database (WAL mode)
  lbzf.db-wal                  # write-ahead log
  lbzf.db-shm                  # shared memory
  backups/
    lbzf-YYYYMMDD-HHMMSS.db    # nightly snapshot, sqlite3 .backup
    latest.db -> lbzf-...      # symlink for the most recent
  frames/                      # snapshot frames referenced by cycle_events
    YYYY/MM/DD/cam{N}_<eventid>.jpg
  video/                       # 10-min H.264 segments, rolling
    YYYY/MM/DD/cam{N}_HHMMSS.mp4
  exports/                     # generated .xlsx (and .csv mirrors)
    indicadores-YYYY-MM-DD.xlsx
  logs/                        # journald already has these; this is dashboard app log

SQLite is opened with:

PRAGMA journal_mode = WAL;           -- concurrent readers don't block the writer
PRAGMA synchronous = NORMAL;         -- WAL + NORMAL is durable enough for our event rate
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 5000;          -- 5s — CV writer can wait for the dashboard
PRAGMA temp_store = MEMORY;
PRAGMA cache_size = -65536;          -- 64 MiB page cache

WAL mode is the single most important choice — without it, the dashboard’s read queries lock the CV writer.

Schema (DDL)

-- ---------------------------------------------------------------
-- modules: a production line (Angela, Carmenza, Camisetas, Empaque)
-- Lifted out of workstations.module so Phase II/III scaling is a
-- single insert rather than a string-typo lottery.
-- ---------------------------------------------------------------
CREATE TABLE modules (
  id                INTEGER PRIMARY KEY,
  code              TEXT NOT NULL UNIQUE,        -- 'ANGELA','CARMENZA','CAMISETAS','EMPAQUE'
  display_name      TEXT NOT NULL,               -- 'Angela'
  product_reference TEXT,                        -- 'Ref22 Slim' for ANGELA in Phase I
  active            INTEGER NOT NULL DEFAULT 1,  -- 0=archived
  created_at        TEXT NOT NULL DEFAULT (datetime('now')),
  updated_at        TEXT NOT NULL DEFAULT (datetime('now'))
);

-- ---------------------------------------------------------------
-- cameras: physical camera devices (separate from workstations
-- because Phase II may add a second camera angle per station).
-- ---------------------------------------------------------------
CREATE TABLE cameras (
  id              INTEGER PRIMARY KEY,
  module_id       INTEGER NOT NULL REFERENCES modules(id),
  label           TEXT NOT NULL,                 -- 'cam1', 'cam2', ...
  rtsp_url        TEXT NOT NULL,                 -- secret; consider env var indirection
  resolution      TEXT NOT NULL DEFAULT '640x480',
  fps             INTEGER NOT NULL DEFAULT 5,    -- inference fps; ADR-004 locks 3-5 fps for Phase I
  status          TEXT NOT NULL DEFAULT 'active' -- 'active','inactive','faulty'
                  CHECK (status IN ('active','inactive','faulty')),
  installed_at    TEXT NOT NULL DEFAULT (datetime('now')),
  notes           TEXT,
  UNIQUE (module_id, label)
);

-- ---------------------------------------------------------------
-- workstations: a logical station (one operation), Phase-I-mapped
-- 1:1 to a camera but the schema allows 1:N.
-- ---------------------------------------------------------------
CREATE TABLE workstations (
  id              INTEGER PRIMARY KEY,
  module_id       INTEGER NOT NULL REFERENCES modules(id),
  primary_camera_id INTEGER REFERENCES cameras(id),
  code            TEXT NOT NULL,                 -- 'PUESTO_01', 'PERRILLAS', ...
  display_name    TEXT NOT NULL,                 -- 'Puesto 1' / 'Perrillas'
  operation       TEXT,                          -- 'PESPUNTAR CUELLO' — matches standard_times.operation
  sequence_pos    INTEGER,                       -- 1..N position in the module flow
  active          INTEGER NOT NULL DEFAULT 1,
  notes           TEXT,
  UNIQUE (module_id, code)
);

CREATE INDEX idx_workstations_module ON workstations(module_id);

-- ---------------------------------------------------------------
-- standard_times: SAM per operation, seeded from
-- 'Ref22 Slim - Angela.xlsx' / "Balanceo" tab.
-- Versioned by (reference, version) so Phase II can ship a
-- new SAM revision without overwriting history.
-- ---------------------------------------------------------------
CREATE TABLE standard_times (
  id              INTEGER PRIMARY KEY,
  reference       TEXT NOT NULL,                 -- 'Ref22 Slim'
  version         TEXT NOT NULL DEFAULT 'v1',    -- bump on revision
  operation       TEXT NOT NULL,                 -- 'PESPUNTAR CUELLO'
  sam_minutes     REAL NOT NULL,                 -- SAM (industrial-engineering term)
  sec_per_cycle   REAL GENERATED ALWAYS AS (sam_minutes * 60.0) STORED,
  machine_code    TEXT,                          -- 'MPP','MREV','MOJA','MBOT','PLCH'
  source          TEXT NOT NULL DEFAULT 'tiempo std manual',
  effective_from  TEXT NOT NULL DEFAULT (datetime('now')),
  effective_to    TEXT,                          -- NULL = currently active
  notes           TEXT,
  UNIQUE (reference, version, operation)
);

CREATE INDEX idx_std_times_lookup ON standard_times(reference, operation, version);

-- ---------------------------------------------------------------
-- operators: deliberately minimal in Phase I — see "Operator ID"
-- section below for why this is OPEN.
-- ---------------------------------------------------------------
CREATE TABLE operators (
  id              INTEGER PRIMARY KEY,
  external_code   TEXT UNIQUE,                   -- whatever LBZF uses internally
  display_name    TEXT,                          -- 'Operario A', 'Operario B', ... in design docs (Confidencial in prod)
  active          INTEGER NOT NULL DEFAULT 1,
  notes           TEXT
);

-- ---------------------------------------------------------------
-- cv_model_versions: which model produced an event (paper requirement)
-- Bumped every time we redeploy a new weight set.
-- ---------------------------------------------------------------
CREATE TABLE cv_model_versions (
  id              INTEGER PRIMARY KEY,
  name            TEXT NOT NULL,                 -- 'yolov8n-coco'
  version         TEXT NOT NULL,                 -- '8.1.30-trt'
  framework       TEXT NOT NULL,                 -- 'ultralytics+tensorrt'
  weights_hash    TEXT,                          -- sha256 of weights file
  deployed_at     TEXT NOT NULL DEFAULT (datetime('now')),
  notes           TEXT,
  UNIQUE (name, version)
);

-- ---------------------------------------------------------------
-- rois: per-camera polygonal regions associating bboxes to
-- workstations. Append-only; each re-calibration creates a new row
-- and sets superseded_by on the old one. Foreign-keyed from
-- cycle_events.roi_version_id so events can be re-interpreted
-- against the ROI configuration active at detection time.
-- See ml/roi-calibration.md.
-- ---------------------------------------------------------------
CREATE TABLE rois (
  id                       INTEGER PRIMARY KEY,
  camera_id                INTEGER NOT NULL REFERENCES cameras(id),
  workstation_id           INTEGER NOT NULL REFERENCES workstations(id),
  polygon_json             TEXT NOT NULL,        -- JSON array of {x,y} in pixel space, 0..640 × 0..480
  ref_frame_path           TEXT NOT NULL,        -- path on disk to the reference frame at calibration time
  ref_homography_features  BLOB,                 -- ORB keypoint+descriptor blob for drift check
  created_at               TEXT NOT NULL DEFAULT (datetime('now')),
  created_by               TEXT NOT NULL,        -- 'ronald', 'sophia', 'itba-rmarino', etc.
  notes                    TEXT,
  superseded_by            INTEGER REFERENCES rois(id)  -- NULL when active
);
CREATE INDEX idx_rois_active ON rois(camera_id, workstation_id) WHERE superseded_by IS NULL;

-- ---------------------------------------------------------------
-- cycle_events: the heart of the dataset. One row per detected
-- start->end pair at a workstation.
-- ---------------------------------------------------------------
CREATE TABLE cycle_events (
  id                          INTEGER PRIMARY KEY,
  workstation_id              INTEGER NOT NULL REFERENCES workstations(id),
  camera_id                   INTEGER REFERENCES cameras(id),
  operator_id                 INTEGER REFERENCES operators(id),  -- nullable in Phase I
  cv_model_version_id         INTEGER NOT NULL REFERENCES cv_model_versions(id),
  roi_version_id              INTEGER NOT NULL REFERENCES rois(id),  -- which ROI was active when this cycle was detected
  start_ts                    TEXT NOT NULL,             -- ISO 8601 UTC: '2026-07-15T13:24:11Z'
  end_ts                      TEXT,                      -- NULL while in-flight
  duration_seconds            REAL GENERATED ALWAYS AS (
                                CASE WHEN end_ts IS NULL THEN NULL
                                     ELSE (julianday(end_ts) - julianday(start_ts)) * 86400.0
                                END
                              ) STORED,
  status                      TEXT NOT NULL DEFAULT 'in_progress'
                              CHECK (status IN ('in_progress','complete','aborted','manual_override')),
  quality                     TEXT NOT NULL DEFAULT 'green'
                              CHECK (quality IN ('green','yellow','red')),  -- detector self-rating; see ml/cycle-event-detection.md §7
  confidence                  REAL,                      -- mean detection confidence over the cycle (alias: mean_conf at write time)
  mean_conf                   REAL,                      -- explicit mean per-frame max-person confidence (paper-grade column)
  primary_track_dwell_ratio   REAL,                      -- 0..1; primary-operator track's dwell fraction over duration_seconds
  detection_count             INTEGER,                   -- frames with a person detected
  source_frame_path           TEXT,                      -- /data/frames/2026/07/15/cam1_12345.jpg
  source_video_path           TEXT,                      -- /data/video/2026/07/15/cam1_132300.mp4
  ref_op_order                TEXT,                      -- e.g. 'OP-2026-0712' — the production order this cycle belongs to (optional)
  notes                       TEXT,                      -- manual override reason, etc.
  created_at                  TEXT NOT NULL DEFAULT (datetime('now'))
);

-- Most-common query: dashboard asks "last N events for workstation X".
CREATE INDEX idx_events_ws_start    ON cycle_events(workstation_id, start_ts DESC);
-- Excel export queries by day-range.
CREATE INDEX idx_events_start_ts    ON cycle_events(start_ts);
-- Operator query for Phase II per-operator efficiency.
CREATE INDEX idx_events_operator    ON cycle_events(operator_id, start_ts);
-- Finding in-flight cycles on startup recovery.
CREATE INDEX idx_events_inflight    ON cycle_events(status) WHERE status='in_progress';
-- Failure-mode dashboards filter by quality band.
CREATE INDEX idx_events_quality     ON cycle_events(quality, start_ts);

-- ---------------------------------------------------------------
-- cycle_events_rejected: cycles the detector saw but dropped
-- (too short, too long, low dwell, confidence collapse). Critical
-- for the paper — the only audit trail of detector throw-away
-- behavior. See ml/cycle-event-detection.md §6.
-- ---------------------------------------------------------------
CREATE TABLE cycle_events_rejected (
  id                          INTEGER PRIMARY KEY,
  workstation_id              INTEGER NOT NULL REFERENCES workstations(id),
  camera_id                   INTEGER REFERENCES cameras(id),
  cv_model_version_id         INTEGER NOT NULL REFERENCES cv_model_versions(id),
  roi_version_id              INTEGER NOT NULL REFERENCES rois(id),
  start_ts                    TEXT NOT NULL,
  end_ts                      TEXT,
  duration_seconds            REAL,
  reason                      TEXT NOT NULL
                              CHECK (reason IN ('too_short','too_long','low_dwell','confidence_drop','drift_unknown')),
  mean_conf                   REAL,
  primary_track_dwell_ratio   REAL,
  detection_count             INTEGER,
  notes                       TEXT,
  created_at                  TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_rejected_ws_start ON cycle_events_rejected(workstation_id, start_ts DESC);

-- ---------------------------------------------------------------
-- manual_observations: Ronald's manual notes attached to a
-- workstation or cycle event — "operator went to bathroom",
-- "machine broke", "ground truth: stopwatch said 31 s".
-- This is the human side of the audit trail. Validation methodology
-- (see 50-research) reads this table for stopwatch ground truth.
-- ---------------------------------------------------------------
CREATE TABLE manual_observations (
  id              INTEGER PRIMARY KEY,
  workstation_id  INTEGER NOT NULL REFERENCES workstations(id),
  cycle_event_id  INTEGER REFERENCES cycle_events(id),   -- nullable: observation may not link to a specific cycle
  observer        TEXT NOT NULL,                         -- 'ronald', 'sophia', 'itba-rmarino'
  observed_ts     TEXT NOT NULL,                         -- when the event being observed happened
  kind            TEXT NOT NULL
                  CHECK (kind IN ('ground_truth_duration','lost_time_note','data_quality_flag','other')),
  payload_json    TEXT NOT NULL,                         -- e.g. {"stopwatch_seconds": 31.4, "notes": "..."}
  created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_manual_obs_ws ON manual_observations(workstation_id, observed_ts);

-- ---------------------------------------------------------------
-- shifts: a named work shift. Lets the exporter group events by
-- jornada without re-deriving it from timestamps every time.
-- ---------------------------------------------------------------
CREATE TABLE shifts (
  id          INTEGER PRIMARY KEY,
  module_id   INTEGER NOT NULL REFERENCES modules(id),
  date        TEXT NOT NULL,                     -- 'YYYY-MM-DD' (local CO date)
  label       TEXT NOT NULL,                     -- 'jornada_1', 'overtime', ...
  start_ts    TEXT NOT NULL,
  end_ts      TEXT,
  notes       TEXT,
  UNIQUE (module_id, date, label)
);

-- ---------------------------------------------------------------
-- lost_time_events: stub for Phase II's 22-code taxonomy.
-- Phase I writes nothing here; defined now so we don't forklift later.
-- ---------------------------------------------------------------
CREATE TABLE lost_time_events (
  id              INTEGER PRIMARY KEY,
  workstation_id  INTEGER REFERENCES workstations(id),
  module_id       INTEGER REFERENCES modules(id),
  shift_id        INTEGER REFERENCES shifts(id),
  code            TEXT NOT NULL,                 -- 'C001'..'C022'
  start_ts        TEXT NOT NULL,
  end_ts          TEXT,
  detected_auto   INTEGER NOT NULL DEFAULT 0,    -- 0=manual entry, 1=CV-detected (Phase II)
  notes           TEXT
);

CREATE INDEX idx_lost_time_module_shift ON lost_time_events(module_id, shift_id);

-- ---------------------------------------------------------------
-- schema_version: single-row table, bumped by every migration.
-- ---------------------------------------------------------------
CREATE TABLE schema_version (
  id         INTEGER PRIMARY KEY CHECK (id=1),
  version    INTEGER NOT NULL,
  applied_at TEXT NOT NULL DEFAULT (datetime('now'))
);
INSERT INTO schema_version (id, version) VALUES (1, 1);

audit_log is parked. Phase I uses journald for write-trail (every dashboard write logs {ts, request_id, user_email, user_role, action, target_table, target_id} to the Jetson’s systemd journal). The full audit_log table with before/after JSON is preserved in docs/design/60-parking/audit-log.md for Phase II.

Enum vocabularies (single source of truth)

Field	Allowed values	Rationale
`cameras.status`	`active`, `inactive`, `faulty`	enforced by CHECK
`cycle_events.status`	`in_progress`, `complete`, `aborted`, `manual_override`	`aborted` covers crashes / startup-recovery cleanup
`lost_time_events.code`	`C001`..`C022` (LBZF taxonomy)	NOT enforced by CHECK — codes loaded at runtime from a config so we don’t redeploy DDL when the taxonomy shifts
`modules.code`	uppercase ASCII, snake_case	`ANGELA`, `CARMENZA`, `CAMISETAS`, `EMPAQUE`
`workstations.code`	uppercase, snake_case, unique per module	`PUESTO_01`, `PERRILLAS`, `PLANCHA_ESPALDA`
`standard_times.machine_code`	`MPP`, `MREV`, `MOJA`, `MBOT`, `PLCH`	matches LBZF spreadsheet vocabulary; not CHECK-enforced because new machines arrive

All operator names and module names that travel to the front end go through the display_name columns, never through code — code is the API/CSV-stable identifier, display_name is human-facing.

Timestamp convention

All *_ts columns are ISO 8601 UTC strings (YYYY-MM-DDTHH:MM:SSZ). Display in es-CO (dd/mm/yyyy HH:MM) is the dashboard / exporter’s job. The reason for UTC at rest: Colombia is UTC-5 with no DST and Argentina is UTC-3 with rare DST politics — UTC at rest sidesteps both. (Open: see excel-export.md for whether the export rows use UTC or local.)

shifts.date is the local Colombia date (YYYY-MM-DD) because shifts are reported per local day on the Excel — keeping that as text avoids midnight-rollover ambiguity in queries.

Operator ID — the hard decision

Phase I question: do we tie cycle_events to a specific operator?

Option	Pros	Cons
A. Don’t capture operator (NULL)	Trivial; no privacy concern	Excel export cannot fill the `operario` column; loses a key axis the IEEE paper would want
B. Manual entry at shift start	No CV cost; simple UI on dashboard	Operators / supervisors must remember; high data quality risk
C. Badge / NFC tap at station	Reliable; clean data	Hardware not in v1.9 BOM; ~$30/station + integration cost
D. Face recognition from existing camera	Zero additional hardware	Privacy red flag — Mariana/Ronald sign-off required; Colombian labor norms; not on the v1.9 plan
E. Defer to Phase II, ship NULL in Phase I	Honest about scope	The “Ronald walks the floor with a stopwatch” workflow he replaces does name the operator, so the Excel will look incomplete

Recommended Phase I: B (manual entry, shift-start dashboard form, NULL allowed) with the schema already capable of D so Phase II can flip the source without forklift. operators.external_code is the stable join key — face-recognition or badge-tap would just be a different mechanism for setting cycle_events.operator_id.

OPEN: privacy / consent gating on face-rec — owner: Agent E (business/privacy), decision: Mariana (CEO sign-off). Blocker for: Phase II behavioral.

Retention policy

Two retention questions get conflated in the v1.9 doc. Separate them:

Asset	Lives where	Retention	Triggered by
`cycle_events` (rows)	SQLite	Forever in Phase I. Paper dataset; ~tens of thousands of rows even at full year.	Never rolled.
`source_frame_path` JPEGs	`/data/frames/`	120 days (matches video retention from v1.9). After 120d, file deleted but the DB row remains, with `source_frame_path` rewritten to `NULL` and a `frame_archived_at` audit entry.	nightly cron `prune_frames.py`
Recorded H.264 segments	`/data/video/`	Rolling, LRU when free space < 10%, soft target 120d at Phase I scale	space-watchdog daemon
`lost_time_events`	SQLite	Forever	n/a
`cycle_events_rejected`	SQLite	Forever	n/a
`manual_observations`	SQLite	Forever	n/a
`cv_model_versions`	SQLite	Forever	n/a
`rois`	SQLite	Forever (append-only via `superseded_by`)	n/a
Nightly DB backups	`/data/backups/`	Last 14 daily + last 12 monthly + last 3 yearly snapshots	nightly cron

The “show me the event from 4 months ago and the frame is gone” case: the dashboard returns the cycle metrics (start/end, duration, efficiency vs SAM, confidence) but displays a “frame archived” placeholder. The paper-dataset-grade fields (start_ts, end_ts, confidence, model_version) all survive.

OPEN: does Ronald need forensic playback older than 120d? — owner: Ronald, decision blocker for: pricing the NVMe upgrade or moving cold video to S3 Glacier.

Backup / replication — what happens when the disk dies

This is the section most likely to bite us in production, so it gets its own subsection.

Failure modes ranked:

NVMe failure (drive death) — likelihood ~1% over a year, impact total. Mitigation: nightly sqlite3 /data/lbzf.db ".backup '/data/backups/lbzf-$(date +%F).db'" plus a rsync --bwlimit=1M /data/backups/latest.db sophia@<tailscale-ip-of-cloud-backup-target>:/srv/lbzf-backups/ over Tailscale. Result: at worst, 24 hours of events lost on full drive failure.
SQLite WAL corruption — likelihood very low given WAL mode + SQLite’s track record, impact partial. Mitigation: nightly PRAGMA integrity_check job; alert on failure; backups available.
Filesystem-full crash mid-write — likelihood moderate (we deliberately keep the disk near full with rolling video). Mitigation: the space-watchdog daemon owns video deletion; the DB lives on the same FS but has a 5 GB reservation enforced by checking df -P /data before every commit-batch.
Power loss mid-write — likelihood high (Colombian power, “Falta de energía” is in the lost-time taxonomy). Mitigation: WAL + synchronous=NORMAL is durable across crashes; at most lose the in-flight WAL frame (~ms). UPS recommendation goes in the hardware bucket, not here.
Jetson SD card death (root filesystem) — likelihood ~5% over a year on consumer SD. Critical that /data is the NVMe, not the SD. The systemd unit, the app code, and /data mount config must be backed up too; see “Image backup” below.

Image backup: monthly dd of the SD card image to /data/backups/sdcard/, plus copy to S3 (when AWS account is live) gives us a flash-rebuild path. OPEN: who owns this — Agent A (hardware) or Agent B (this doc)? Cross-bucket — flagging.

OPEN: where does the off-Jetson backup go? — owner: Andrew/Sophia. Options: (a) S3 bucket via AWS account once live, (b) Backblaze B2 (cheaper), (c) a second Tailscale-connected box in Pereira (CO-side resilience), (d) sync to a teammate’s machine over Tailscale (cheap but not durable). My pick: (a) S3 with Glacier lifecycle for >30 days; ~$0.50/mo at our data volume.

Migrations

Three options:

Option	Pros	Cons	Pick
Alembic	Pythonic, transactional, used everywhere	Heavy for SQLite; declarative-vs-imperative awkwardness	No
yoyo-migrations	Lightweight, plain `.sql` files, made for SQLite	Less common; team unfamiliar	Yes
Hand-rolled `migrations/000N_*.sql` runner	No deps; we own the failure modes	Re-implementing yoyo poorly	No

Migration files live in app/migrations/0001_initial.sql etc., applied in version order by the systemd unit at startup if schema_version.version is behind.

Sample data lifecycle (concrete example)

T=0     Camera 1 sub-stream produces a frame batch.
T=0+5ms YOLOv8n detects "person at workstation 1, conf 0.91" → emits a
        START candidate; CV writer INSERTs cycle_events(workstation_id=1,
        cv_model_version_id=2, start_ts=now, status='in_progress',
        confidence=0.91, source_frame_path='/data/frames/2026/07/15/cam1_47123.jpg').

T=+42s  Person leaves workstation 1; YOLOv8n loses detection for >threshold
        → CV writer UPDATEs cycle_events SET end_ts=now,
        status='complete', detection_count=210 (5fps*42, per ADR-004).
        duration_seconds is auto-computed by the generated column.

T=+1d   Nightly export job reads cycle_events JOIN standard_times JOIN
        workstations JOIN modules → indicadores-2026-07-15.xlsx
        (see excel-export.md).

T=+120d Nightly prune sets source_frame_path=NULL and unlinks the JPG.
        The cycle_events row is untouched.

T=+1yr  Aggregate report uses cycle_events directly — frame is gone but
        duration_seconds, confidence, model_version are all queryable.

IEEE paper dataset requirements

The methods section of an IEEE CASE / Access / T-ASE paper typically needs, per detection event:

timestamp (UTC) — ✅ start_ts
model version — ✅ cv_model_version_id → cv_model_versions.version
weights hash — ✅ cv_model_versions.weights_hash
detection confidence — ✅ confidence (mean over the cycle)
frame count contributing — ✅ detection_count
ground-truth label source — MISSING — we don’t track which events were spot-checked / hand-labeled vs auto-emitted
evaluator identity (for inter-rater reliability) — MISSING

Proposed addition for Phase I:

-- For paper-grade evaluation, mark a subset of events as hand-verified.
ALTER TABLE cycle_events ADD COLUMN gt_verified_by TEXT;        -- email of verifier
ALTER TABLE cycle_events ADD COLUMN gt_verified_at TEXT;        -- iso ts
ALTER TABLE cycle_events ADD COLUMN gt_corrected_duration_s REAL; -- if verifier overrode

These are NULL for the vast majority of rows; a held-out set of N≈200 per module gets labeled during the July deployment week. This unblocks reporting a precision/recall figure in the paper.

Alternatives considered

Postgres on the Jetson — overkill; SQLite handles thousands of events/day with two orders of magnitude of headroom. Lost: nothing actually needed.
TimescaleDB / InfluxDB for time-series cycle data — would be lovely for Phase II’s continuous metrics; ahead-of-need for Phase I. SQLite + a start_ts index covers our query patterns.
DuckDB instead of SQLite — fast analytics, but we need concurrent writes from the CV loop and concurrent reads from the dashboard. SQLite WAL is the better fit.
Cloud DB primary (Aurora / Cloud SQL) with local cache — fails the “Tailscale dies and the plant keeps producing” requirement. Plant operations must not depend on California-side connectivity.
Flat cycle_events.module string instead of module_id FK — v1.9 sketches this; rejected because Phase II’s “scale to 3 more modules” becomes a typo-search rather than a config change.
cycle_events as wide table with operator name, station name, module name denormalized — denormalization tempting for reporting performance, but the rename surface (an operator changes their preferred display name, a station gets renamed) wreaks havoc.

Open questions

OPEN: operator-ID capture mechanism for Phase I — owner: Agent E + Mariana, blocker for: Excel operario column fidelity. Recommended default: manual shift-start entry.
OPEN: off-Jetson backup target (S3 vs B2 vs Tailscale-peer) — owner: Andrew/Sophia. Blocker for: pre-Pereira deploy checklist.
OPEN: who owns SD-card image backup — Agent A (hardware) or Agent B (this doc)? — owner: Sophia + Agent A. Suggested split: Agent A specs the hardware path, Agent B owns the application-DB backup cron.
OPEN: does Ronald need forensic frame playback older than 120 days? — owner: Ronald (via Armando), affects: NVMe sizing, cold-storage cost.
OPEN: shift schedule — does LBZF run one shift or two? — owner: Ronald, blocker for: shifts seeding and per-shift Excel export. The jornada column in INDICADORES ABRIL.xlsx will answer this when we can open the file.
OPEN: how do we get the actual contents of Ref22 Slim - Angela.xlsx / “Balanceo” tab into the seed file? Currently we have folder IDs (1ulRLEc4pUr4x2JzlzX9d_abl8-hleluq) but not file contents in the repo. Owner: Sophia / Armando, blocker for: seeding standard_times.
OPEN: production-order (OP-...) — is one per garment, per batch, per shift? — owner: Ronald, affects: cycle_events.ref_op_order semantics.
OPEN: do we want events for “person present at station but no cycle activity” (i.e., idle time)? That’s Phase II behavioral, but the schema could already express it as cycle_events rows with a special status. Owner: Agent D + Andrew.

Cross-bucket dependencies

This doc depends on	Owner bucket	What we need
Operation-name vocabulary (`PESPUNTAR CUELLO` etc.)	ML/CV (Agent D)	Stable string that the CV writer puts in `workstations.operation` and that joins to `standard_times.operation`
Camera RTSP credentials, IPs	Hardware (Agent A)	Populates `cameras.rtsp_url`; consider env-var indirection so secrets don’t sit in the DB
Shift schedule	Business (Agent E / Ronald)	`shifts` seed rows; affects export grouping
Operator privacy / consent posture	Business (Agent E)	Decides which `operators` capture mechanism we can ship
Cloud backup destination	Deployment (this bucket, file `deployment-and-cicd.md`)	S3 vs B2 vs Tailscale peer

This doc implies a change in	Owner bucket	What we’re asking them to do
`docs/technical/software.md`	Existing repo	The 3-table sketch is now superseded by this 10-table schema; either redirect or fold in
ML inference loop	ML (Agent D)	Must INSERT/UPDATE against `cycle_events`, set `cv_model_version_id`, write the frame path before committing the row
Frontend (Agent C)	Frontend	Reads use `display_name`, never `code`; user-facing dates use es-CO format
Auth model (Agent C + this bucket’s `user-org-and-auth.md`)	Backend + Frontend	journald log line `actor=<email> (<role>)` must be set on every dashboard write — wire from the Auth0 identity. (When `audit_log` thaws per `60-parking/audit-log.md`, this becomes a DB write.)

What’s weak in this doc

standard_times seeding is described but not executed — without the actual Ref22 Slim - Angela.xlsx / Balanceo rows in the repo, “seed from spreadsheet” is hand-wavy. We should commit a CSV mirror of that tab to the repo and treat the CSV as the source-of-truth for the seed migration. Right now there’s a real risk the schema is correct but the data doesn’t land in time.
The retention split (events forever, frames 120d) is asserted without Ronald’s input. If Ronald, on first use, asks for an event from 130 days ago expecting to see the frame, we’ll have shipped a usability regression vs the current “stopwatch + memory” workflow he replaced. We should ask him explicitly during the May 15 trip prep.
Operator ID is the load-bearing decision and we punted to “B with capability for D.” That’s defensible but it means every other doc (Excel export, dashboard API, auth) has to assume NULL-operator and degrade gracefully. There’s a real chance Phase I ships with the operario column blank, which Ronald will hate.
Backup story is well-described but unimplemented — nightly backup cron + off-Jetson sync is two scripts we haven’t written. Likely first thing to slip if July deploy gets crunched.
No load-tested numbers. 2 cameras × 3–5 fps × ~one cycle/min is well within SQLite’s tens-of-thousands-of-inserts-per-second envelope, but I haven’t proven it on the actual Jetson. The PRAGMAs are educated guesses, not measurements.
No tombstoning / soft delete pattern. Hard delete is allowed (only via a journald-logged dashboard action), but if Ronald says “delete operator A’s data” we don’t have a clean GDPR-style erasure path — the paper dataset would lose those rows. Probably fine for Phase I; revisit when LBZF asks.

Rollout

Now (May 2026): commit app/db/schema.sql (this DDL) + app/migrations/0001_initial.sql to the repo. Add app/db/seed_standard_times.py that reads a committed seeds/ref22_slim_balanceo.csv (placeholder until the real Excel can be parsed). Add a stub app/db/backup.py.
Before Argentina (2026-05-15): schema initialises on the dev Jetson; demo sqlite3 lbzf.db .tables showing all tables; ITBA team gets the same schema on their twin hardware.
Before Pereira (Jul 2026): real Ref22 Slim - Angela.xlsx data committed; backup cron live and tested by simulating drive failure; SD-card image baseline taken.
Phase II: add lost_time_events writers; add cv_behavioral_events table; add operator_id auto-population via face-rec only after privacy sign-off.
Phase III: scale modules rows; no schema change needed.

Dependency chain: schema DDL unblocks dashboard API (dashboard-api.md) and Excel exporter (excel-export.md); operator-ID decision unblocks the export operario column and the auth/identity story.