Dashboard API spec — Flask/FastAPI on :5000
Bucket: Backend (Agent B) Status: Reviewed — 2026-05-12 (Phase B seams applied) Owner: Sophia · Reviewers: Andrew, Agent C (frontend / user roles), Agent D (ML/CV writer client)
Context
Section titled “Context”The Jetson hosts a local dashboard on :5000 that Tailscale-connected clients reach. v1.9 names “Flask or FastAPI (decision deferred; Flask easier)” and lists “live cycle times per workstation vs SAM benchmark, color-coded by efficiency.” This doc finishes the spec: framework choice, every endpoint, auth model, schemas, sample payloads, and the error contract. Live-update transport (WS / push channel) is parked to docs/design/60-parking/websocket-live-updates.md — Phase I uses REST polling.
The decision being made: what does every authenticated user, every page of the dashboard, and every external automation actually hit, with what payloads, behind what auth.
- A single set of REST endpoints that the dashboard, the lbzfai.com integration, and any CLI client all use
- Auth that works both when Mariana opens her phone on the plant Wi-Fi AND when Sophia hits the dashboard from California over Tailscale
- Response shapes the IEEE paper can cite (versioned URL + versioned schema)
- Sub-second p95 for the read endpoints at Phase III’s 20-station scale
- Honest error contract — no swallowed exceptions, no 500s without context
Non-goals
Section titled “Non-goals”- A public API on lbzfai.com — that’s a separate surface (see
lbzfai-jetson-integration.md) - User CRUD (creating users, password resets) — that’s Auth0’s job, not this API’s
- Direct video frame serving at scale — frames return as paths to the static file server; not piped through the API process
- Multi-tenant isolation at this layer — Phase I is single-tenant LBZF; org-tagging is at the DB layer (
modules.id), the API does not enforce it yet - WS / push channel / push live updates — parked to
docs/design/60-parking/websocket-live-updates.md. Phase I polls REST every 5 s.
Proposed approach
Section titled “Proposed approach”Framework: FastAPI
Section titled “Framework: FastAPI”| Flask | FastAPI | Pick | |
|---|---|---|---|
| Async serialization for read-heavy JSON | needs gevent / quart-style shim | first-class | FastAPI |
| Auto request/response validation | manual (marshmallow / pydantic add-on) | pydantic native | FastAPI |
| OpenAPI generation | optional plugin | first-class — Swagger UI at /docs | FastAPI |
| Team familiarity | Higher | Slightly less | Tie |
| Footprint on Jetson | ~30 MB | ~40 MB | Don’t care |
FastAPI wins. Auto-generated OpenAPI at /docs is also a cheap demo for the Argentina presentation, and the framework choice survives Phase II’s eventual WS / push channel move (parked) without a rewrite.
Process model
Section titled “Process model”uvicorn app.main:app --host 0.0.0.0 --port 5000 --workers 1- One worker — the CV writer is the only DB writer, the dashboard is read-mostly; SQLite WAL is fine with one writer and many readers; multiple uvicorn workers would each open their own SQLite connection pool and complicate WAL handling
systemdunitlbzf-dashboard.service, restart on failure- TLS terminated by Tailscale (HTTPS via Tailscale’s MagicDNS); the uvicorn process itself binds plain HTTP
Auth model — three principals, one decision
Section titled “Auth model — three principals, one decision”Three caller classes:
- Browser user on the Jetson’s network or via Tailscale (Sophia, Andrew, Mariana, Ronald)
- Browser user on lbzfai.com wanting to see live data (via the cloud integration in
lbzfai-jetson-integration.md) - Service-to-service — the CV writer, the exporter cron, the backup job (all local to the Jetson)
Auth strategy:
| Principal | Phase I mechanism |
|---|---|
| Browser user via Tailscale | Tailscale Funnel / Tailscale Serve acts as the identity proxy. The dashboard accepts an Tailscale-User-Login header (Tailscale injects it when accessed via Tailscale’s HTTPS proxy). Falls back to local password if direct LAN. |
| Browser user via lbzfai.com (cloud) | Auth0 JWT in Authorization: Bearer <jwt>. The dashboard’s cloud-side proxy (see lbzfai-jetson-integration.md) validates the JWT and forwards a X-LBZF-Identity header containing the user’s email + role. |
| Service-to-service | Local Unix socket OR shared HMAC token in env var — the CV writer doesn’t authenticate as a human |
A single middleware resolves identity in this order:
X-LBZF-Identity(set by the cloud proxy, only trusted if a shared secret headerX-LBZF-Proxy-Authmatches)Tailscale-User-Login(set by Tailscale Serve when reached through Tailscale)Authorization: Bearer <jwt>(Auth0 JWT; verified against Auth0 JWKS)- Local cookie session (if signed in via
/auth/local) - Reject — 401
Role mapping (echoes Agent C’s user/org model — see user-org-and-auth.md):
Phase I uses 3 effective roles (admin, engineer, executive). Supervisor and research are parked in docs/design/60-parking/full-role-matrix.md; viewer exists only as the deny-by-default landing for any authenticated stranger.
| Email pattern / Auth0 role | App role | Permissions |
|---|---|---|
sophiamann@evasglobal.com, sophiainesmann@gmail.com | admin | full read/write incl. overrides |
| Andrew | admin | same |
ingenierialbarton@gmail.com (Ronald) | engineer | read all, override cycles, trigger exports |
| Mariana | executive | read all aggregates, no writes |
| anyone else with a valid Auth0 token | viewer | nothing in Phase I; locked out |
role is computed in middleware from the resolved identity (read from the https://lbzfai.com/role JWT claim — singular string), attached to request.state.user, and checked by per-endpoint decorators.
URL conventions
Section titled “URL conventions”- All endpoints under
/api/v1/(URL-versioned for IEEE paper citability) - Static frames at
/static/frames/...(served by uvicorn; Tailscale handles auth) - Swagger UI at
/docs, OpenAPI JSON at/openapi.json(admin-only in prod, open in dev) - WS / push channel live channel is parked (Phase II); see
docs/design/60-parking/websocket-live-updates.md. Frontend polls REST every 5 s.
Endpoint table
Section titled “Endpoint table”| Method | Path | Auth (min role) | Purpose |
|---|---|---|---|
| GET | /api/v1/health | none | liveness + DB-reachable + WAL-status |
| GET | /api/v1/version | viewer | API version, CV model version, DB schema version |
| GET | /api/v1/modules | viewer | list modules |
| GET | /api/v1/modules/{module_id} | viewer | module detail incl. workstations |
| GET | /api/v1/workstations | viewer | list workstations (filter by module) |
| GET | /api/v1/workstations/{ws_id} | viewer | station detail incl. current operator, current cycle, last 10 cycles |
| GET | /api/v1/standard-times | viewer | list standard times (active version) |
| GET | /api/v1/efficiency/station/{ws_id} | viewer | per-station efficiency (params: from, to, granularity) |
| GET | /api/v1/efficiency/module/{module_id} | viewer | per-module efficiency aggregate |
| GET | /api/v1/efficiency/operator/{operator_id} | engineer | per-operator efficiency (Phase II) |
| GET | /api/v1/cycles | viewer | paginated cycle events (filters: module_id, workstation_id, operator_id, from, to, status) |
| GET | /api/v1/cycles/{event_id} | viewer | single event detail + frame URL |
| POST | /api/v1/cycles/{event_id}/override | engineer | manually override duration_seconds or end_ts; writes audit_log |
| POST | /api/v1/cycles/{event_id}/verify | engineer | mark as ground-truth verified (sets gt_verified_by, gt_verified_at) |
| GET | /api/v1/exports | viewer | list available exports |
| POST | /api/v1/exports | engineer | trigger a new export run (body: {module_id, date}) |
| GET | /api/v1/exports/{filename} | viewer | download an .xlsx or .csv |
| GET | /api/v1/operators | engineer | list operators |
| POST | /api/v1/operators | engineer | create operator (manual entry; Confidencial) |
| GET | /api/v1/cv-model-versions | viewer | list model versions deployed on this Jetson |
| POST | /api/v1/auth/local/login | none | local username/password login (fallback when neither Auth0 nor Tailscale is available) |
| POST | /api/v1/auth/local/logout | viewer | clear session cookie |
| GET | /api/v1/me | viewer | current user identity + role |
CV writer → SQLite (direct, no HTTP)
Section titled “CV writer → SQLite (direct, no HTTP)”The CV writer process opens its own SQLite connection (WAL mode) and writes cycle_events rows directly. There is no /api/v1/internal/cycles endpoint. SQLite WAL gives us one writer + many readers without an HTTP hop in the hottest loop.
- The dashboard API process opens the same DB read-only.
- Schema migrations are applied by the dashboard process on startup; the CV writer waits for
schema_version≥ its required minimum before starting writes. - CV writer heartbeat is a row in a
telemetrytable the dashboard reads, not an inbound HTTP call.
(An internal HTTP cycle-write endpoint was considered and parked to docs/design/60-parking/websocket-live-updates.md’s neighborhood — see ml/cycle-event-detection.md for the canonical CV-side spec.)
Schema examples (pydantic shapes)
Section titled “Schema examples (pydantic shapes)”# response: GET /api/v1/healthclass HealthOut(BaseModel): status: Literal["ok", "degraded", "down"] db_reachable: bool db_schema_version: int db_wal_size_bytes: int last_event_ts: datetime | None # is the CV writer alive? cameras_online: int cameras_total: int disk_free_pct: float uptime_seconds: int
# response: GET /api/v1/workstations/{ws_id}class WorkstationOut(BaseModel): id: int module_id: int code: str # 'PUESTO_01' display_name: str # 'Puesto 1' operation: str | None # 'PESPUNTAR CUELLO' sequence_pos: int | None current_cycle: CycleEventOut | None last_cycles: list[CycleEventOut] # last 10 efficiency_24h: float | None sam_seconds: float | None
class CycleEventOut(BaseModel): id: int workstation_id: int operator_id: int | None start_ts: datetime # serialized as ISO 8601 UTC end_ts: datetime | None duration_seconds: float | None status: Literal["in_progress","complete","aborted","manual_override"] confidence: float | None cv_model_version: str # 'yolov8n-coco/8.1.30-trt' source_frame_url: str | None # /static/frames/... or null if archived notes: str | None
# response: GET /api/v1/efficiency/module/{module_id}?from=...&to=...&granularity=hourclass ModuleEfficiencyOut(BaseModel): module_id: int from_ts: datetime to_ts: datetime granularity: Literal["minute","hour","shift","day"] series: list[EfficiencyPoint] summary: EfficiencySummary
class EfficiencyPoint(BaseModel): ts: datetime units_produced: int units_target: int efficiency_pct: float avg_cycle_seconds: floatSample payloads
Section titled “Sample payloads”// GET /api/v1/health{ "status": "ok", "db_reachable": true, "db_schema_version": 1, "db_wal_size_bytes": 4194304, "last_event_ts": "2026-07-15T13:24:53Z", "cameras_online": 6, "cameras_total": 6, "disk_free_pct": 71.4, "uptime_seconds": 144203}
// GET /api/v1/workstations/1{ "id": 1, "module_id": 1, "code": "PUESTO_01", "display_name": "Puesto 1", "operation": "PESPUNTAR CUELLO", "sequence_pos": 1, "current_cycle": { "id": 47891, "workstation_id": 1, "operator_id": 7, "start_ts": "2026-07-15T13:24:11Z", "end_ts": null, "duration_seconds": null, "status": "in_progress", "confidence": 0.91, "cv_model_version": "yolov8n-coco/8.1.30-trt", "source_frame_url": "/static/frames/2026/07/15/cam1_47891.jpg", "notes": null }, "last_cycles": [/* 10 most recent completed cycles */], "efficiency_24h": 0.612, "sam_seconds": 22.8}Live updates: REST polling (Phase I)
Section titled “Live updates: REST polling (Phase I)”The frontend polls GET /api/v1/cycles?after_id=<last_seen> every 5 s while a tab is foregrounded. Per-station status comes from GET /api/v1/workstations/{ws_id}. Two cameras × ~one cycle/min/station is trivial polling load on SQLite.
WS / push channel live-channel design is preserved in docs/design/60-parking/websocket-live-updates.md for Phase II, including reconnect/replay semantics. Phase I does not ship it.
Error contract
Section titled “Error contract”All errors return JSON:
{ "error": { "code": "CYCLE_NOT_FOUND", "message": "No cycle_events row with id=99999", "request_id": "01HVK..." }}| Status | When |
|---|---|
| 200 | success |
| 201 | resource created (POST /exports) |
| 204 | success, no body (PATCH internal) |
| 400 | malformed request / validation error |
| 401 | no valid identity |
| 403 | identity OK, role insufficient |
| 404 | resource missing |
| 409 | conflict (e.g., trying to override an already-overridden event without force=true) |
| 422 | pydantic validation error (FastAPI default) |
| 429 | rate-limited (exports endpoint: max 1 in flight per user) |
| 500 | unhandled — never desired |
| 503 | DB unreachable / WAL corruption suspected — health check failing |
Every response carries X-Request-Id for correlation with audit_log and journalctl.
Rate limits
Section titled “Rate limits”| Endpoint | Limit |
|---|---|
POST /api/v1/exports | 1 in flight per user, 12/hour |
POST /api/v1/auth/local/login | 5 attempts / 15 min / source IP |
| everything else | 60 req/s / user (very lenient; SQLite is the bottleneck not the API) |
Implemented with slowapi.
Access-Control-Allow-Origin: https://lbzfai.com plus https://*.ts.net (Tailscale Funnel hostnames). No *.
OpenAPI / docs
Section titled “OpenAPI / docs”/docs and /redoc available, but locked to admin role in production; opened to all in --dev mode for the demo.
Observability
Section titled “Observability”- Every request logged structured JSON:
{ts, request_id, method, path, status, latency_ms, user_email, user_role} - Journald is the sink (
systemctl status lbzf-dashboard;journalctl -u lbzf-dashboard -f) /api/v1/healthreturnsdb_wal_size_bytesso we can alert on WAL bloat- Phase II: ship logs to CloudWatch via Tailscale; Phase I just lives in journald
Alternatives considered
Section titled “Alternatives considered”- WS / push channel push instead of REST polling for Phase I — works, but adds a meaningful new surface (reconnect logic, subscribe filters, Cloudflare-Worker WS proxy unknowns) for a load profile that REST polling handles trivially. Parked to
docs/design/60-parking/websocket-live-updates.mdfor Phase II when behavioral overlays may want sub-second push. - Internal HTTP cycle-write endpoint (
POST /api/v1/internal/cycles) — considered. Adds an HTTP hop in the hottest CV loop. Direct SQLite-with-WAL gives the same architectural cleanliness (one writer process) without the hop. Seeml/cycle-event-detection.mdfor the writer-side contract. - Flask + flask-sock — works, but pydantic + FastAPI + auto-OpenAPI is the better full stack for the small per-developer-day tradeoff.
- Django + Channels — overkill; we don’t need an ORM or admin
- gRPC — wrong layer for a browser client; lovely for CV-writer → API but the writer can use SQLite directly.
- Auth0 SPA token directly to Jetson — possible but tying the JWT audience to a Jetson hostname is awkward across Tailscale’s dynamic DNS. Cleaner to terminate Auth0 in the cloud proxy and forward an
X-LBZF-Identityheader to the Jetson. - No local-auth fallback — risky; if Tailscale is down and Auth0 is down we have no way to log in to diagnose. The
/auth/localroute gives Sophia a break-glass option (long random password kept in 1Password).
Open questions
Section titled “Open questions”- OPEN: Tailscale Funnel vs Tailscale Serve for HTTPS / identity — owner: Sophia / Andrew. Funnel exposes to public internet (we don’t want that), Serve is internal-only (what we want). Confirm Tailscale plan tier supports Serve with identity headers.
- OPEN: role list and email→role mapping — owner: Agent C (user/org bucket). This doc assumes the mapping in
user-org-and-auth.md; if Agent C changes it, this table must follow. - OPEN: pagination convention — keyset (recommended for
cycle_eventswhich is append-only) vs offset. This doc assumes keyset (?after_id=) but doesn’t specify the cursor format. - OPEN: which endpoints are safe to expose on lbzfai.com vs Tailscale-only? Cycles, efficiency = OK; operator names, override = Tailscale-only. Owner: Agent E + Mariana.
- OPEN: 4xx error message localization — Spanish for Ronald? English-only is fine for Phase I but flag for Phase II.
- OPEN: do we support API tokens (long-lived) for the CSV/Excel automations Ronald might script in his own time? Phase II.
Cross-bucket dependencies
Section titled “Cross-bucket dependencies”| This doc depends on | Owner bucket | What we need |
|---|---|---|
| DB schema (every endpoint reads from it) | This bucket (data-model.md) | DDL finalized |
| Auth identity flows (Auth0 + Tailscale) | Frontend (Agent C) + lbzfai-jetson-integration | Identity proxy decisions |
| User role definitions | Frontend (Agent C) — user-org-and-auth.md | Email→role table |
| CV writer writes directly to SQLite (not through HTTP) | ML (Agent D) | cycle-event-detection.md owns the writer contract; this API reads only |
| Excel exporter trigger | This bucket (excel-export.md) | POST /exports calls into it |
| This doc implies | Owner | Ask |
|---|---|---|
| Frontend polls REST every 5 s for live state (WS parked) | Agent C | Implement after-cursor polling + visibility-pause |
| ML writer is the only SQLite writer | Agent D | Dashboard API opens DB read-only |
Cloud proxy injects X-LBZF-Identity | this bucket (lbzfai-jetson-integration.md) | Build the proxy (Tailscale-only Phase I per ADR-005) |
What’s weak in this doc
Section titled “What’s weak in this doc”- Direct-SQLite from the CV writer means schema migrations must be coordinated. The writer must wait for
schema_version >= required_minat startup before writes; if migrations are slow the writer is blocked. The dashboard API process owns migrations. Documented indata-model.md. - The auth model has three mechanisms (Tailscale header, Auth0 JWT, local session) plus the CV writer’s local file access. Each is reasonable in isolation, but the middleware that composes them is the single most error-prone surface in the whole backend. Needs a written test suite, not just hand-waving.
- 5 s REST polling is OK at Phase I scale but not at Phase III. Two cameras → twenty cameras at 5 s/poll = 4×/s read of
/cycles?after_id=. SQLite handles it, but Phase II should reconsider WS push from the parked design. - Role names locked to
admin / engineer / executivefor Phase I. If Agent C ships divergent names, every endpoint’s@require_role("executive")is wrong; the canonical list lives inuser-org-and-auth.md. - No explicit answer to “what happens when SQLite is locked for 6+ seconds because the exporter is doing a big read.” The
busy_timeout=5000indata-model.mdsays we’ll 503 after 5s. That’s correct but Ronald clicking “export” shouldn’t degrade the live dashboard for everyone else — the exporter should use a read-only connection on a snapshot. Mentioned but not specced.
Rollout
Section titled “Rollout”- Now (May 2026): scaffold
app/api/v1/with FastAPI; implement/health,/version,/modules,/workstations,/cyclesagainst the seed DB; auth middleware stubbed (X-Test-Userheader) for local dev. - Before Argentina (2026-05-15): demo REST polling against a synthetic event generator on the Jetson over Tailscale. ITBA gets the same scaffolding.
- Before Pereira (Jul 2026): all viewer + engineer endpoints live; Auth0 JWT verification path tested end-to-end via Tailscale Serve identity header.
- Day-1 Pereira: local-auth break-glass tested; Mariana able to view via Tailscale on her phone; Ronald able to trigger an export.
- Phase II: WS / push channel live channel (per parked design); per-operator endpoints; OpenAPI client generation for ITBA’s analysis scripts.
The dashboard API unblocks: frontend (cannot render anything without it), exporter (uses POST /exports).