Skip to content

Dashboard API spec — Flask/FastAPI on :5000

Bucket: Backend (Agent B) Status: Reviewed — 2026-05-12 (Phase B seams applied) Owner: Sophia · Reviewers: Andrew, Agent C (frontend / user roles), Agent D (ML/CV writer client)

The Jetson hosts a local dashboard on :5000 that Tailscale-connected clients reach. v1.9 names “Flask or FastAPI (decision deferred; Flask easier)” and lists “live cycle times per workstation vs SAM benchmark, color-coded by efficiency.” This doc finishes the spec: framework choice, every endpoint, auth model, schemas, sample payloads, and the error contract. Live-update transport (WS / push channel) is parked to docs/design/60-parking/websocket-live-updates.md — Phase I uses REST polling.

The decision being made: what does every authenticated user, every page of the dashboard, and every external automation actually hit, with what payloads, behind what auth.

  • A single set of REST endpoints that the dashboard, the lbzfai.com integration, and any CLI client all use
  • Auth that works both when Mariana opens her phone on the plant Wi-Fi AND when Sophia hits the dashboard from California over Tailscale
  • Response shapes the IEEE paper can cite (versioned URL + versioned schema)
  • Sub-second p95 for the read endpoints at Phase III’s 20-station scale
  • Honest error contract — no swallowed exceptions, no 500s without context
  • A public API on lbzfai.com — that’s a separate surface (see lbzfai-jetson-integration.md)
  • User CRUD (creating users, password resets) — that’s Auth0’s job, not this API’s
  • Direct video frame serving at scale — frames return as paths to the static file server; not piped through the API process
  • Multi-tenant isolation at this layer — Phase I is single-tenant LBZF; org-tagging is at the DB layer (modules.id), the API does not enforce it yet
  • WS / push channel / push live updates — parked to docs/design/60-parking/websocket-live-updates.md. Phase I polls REST every 5 s.
FlaskFastAPIPick
Async serialization for read-heavy JSONneeds gevent / quart-style shimfirst-classFastAPI
Auto request/response validationmanual (marshmallow / pydantic add-on)pydantic nativeFastAPI
OpenAPI generationoptional pluginfirst-class — Swagger UI at /docsFastAPI
Team familiarityHigherSlightly lessTie
Footprint on Jetson~30 MB~40 MBDon’t care

FastAPI wins. Auto-generated OpenAPI at /docs is also a cheap demo for the Argentina presentation, and the framework choice survives Phase II’s eventual WS / push channel move (parked) without a rewrite.

  • uvicorn app.main:app --host 0.0.0.0 --port 5000 --workers 1
  • One worker — the CV writer is the only DB writer, the dashboard is read-mostly; SQLite WAL is fine with one writer and many readers; multiple uvicorn workers would each open their own SQLite connection pool and complicate WAL handling
  • systemd unit lbzf-dashboard.service, restart on failure
  • TLS terminated by Tailscale (HTTPS via Tailscale’s MagicDNS); the uvicorn process itself binds plain HTTP

Auth model — three principals, one decision

Section titled “Auth model — three principals, one decision”

Three caller classes:

  1. Browser user on the Jetson’s network or via Tailscale (Sophia, Andrew, Mariana, Ronald)
  2. Browser user on lbzfai.com wanting to see live data (via the cloud integration in lbzfai-jetson-integration.md)
  3. Service-to-service — the CV writer, the exporter cron, the backup job (all local to the Jetson)

Auth strategy:

PrincipalPhase I mechanism
Browser user via TailscaleTailscale Funnel / Tailscale Serve acts as the identity proxy. The dashboard accepts an Tailscale-User-Login header (Tailscale injects it when accessed via Tailscale’s HTTPS proxy). Falls back to local password if direct LAN.
Browser user via lbzfai.com (cloud)Auth0 JWT in Authorization: Bearer <jwt>. The dashboard’s cloud-side proxy (see lbzfai-jetson-integration.md) validates the JWT and forwards a X-LBZF-Identity header containing the user’s email + role.
Service-to-serviceLocal Unix socket OR shared HMAC token in env var — the CV writer doesn’t authenticate as a human

A single middleware resolves identity in this order:

  1. X-LBZF-Identity (set by the cloud proxy, only trusted if a shared secret header X-LBZF-Proxy-Auth matches)
  2. Tailscale-User-Login (set by Tailscale Serve when reached through Tailscale)
  3. Authorization: Bearer <jwt> (Auth0 JWT; verified against Auth0 JWKS)
  4. Local cookie session (if signed in via /auth/local)
  5. Reject — 401

Role mapping (echoes Agent C’s user/org model — see user-org-and-auth.md):

Phase I uses 3 effective roles (admin, engineer, executive). Supervisor and research are parked in docs/design/60-parking/full-role-matrix.md; viewer exists only as the deny-by-default landing for any authenticated stranger.

Email pattern / Auth0 roleApp rolePermissions
sophiamann@evasglobal.com, sophiainesmann@gmail.comadminfull read/write incl. overrides
Andrewadminsame
ingenierialbarton@gmail.com (Ronald)engineerread all, override cycles, trigger exports
Marianaexecutiveread all aggregates, no writes
anyone else with a valid Auth0 tokenviewernothing in Phase I; locked out

role is computed in middleware from the resolved identity (read from the https://lbzfai.com/role JWT claim — singular string), attached to request.state.user, and checked by per-endpoint decorators.

  • All endpoints under /api/v1/ (URL-versioned for IEEE paper citability)
  • Static frames at /static/frames/... (served by uvicorn; Tailscale handles auth)
  • Swagger UI at /docs, OpenAPI JSON at /openapi.json (admin-only in prod, open in dev)
  • WS / push channel live channel is parked (Phase II); see docs/design/60-parking/websocket-live-updates.md. Frontend polls REST every 5 s.
MethodPathAuth (min role)Purpose
GET/api/v1/healthnoneliveness + DB-reachable + WAL-status
GET/api/v1/versionviewerAPI version, CV model version, DB schema version
GET/api/v1/modulesviewerlist modules
GET/api/v1/modules/{module_id}viewermodule detail incl. workstations
GET/api/v1/workstationsviewerlist workstations (filter by module)
GET/api/v1/workstations/{ws_id}viewerstation detail incl. current operator, current cycle, last 10 cycles
GET/api/v1/standard-timesviewerlist standard times (active version)
GET/api/v1/efficiency/station/{ws_id}viewerper-station efficiency (params: from, to, granularity)
GET/api/v1/efficiency/module/{module_id}viewerper-module efficiency aggregate
GET/api/v1/efficiency/operator/{operator_id}engineerper-operator efficiency (Phase II)
GET/api/v1/cyclesviewerpaginated cycle events (filters: module_id, workstation_id, operator_id, from, to, status)
GET/api/v1/cycles/{event_id}viewersingle event detail + frame URL
POST/api/v1/cycles/{event_id}/overrideengineermanually override duration_seconds or end_ts; writes audit_log
POST/api/v1/cycles/{event_id}/verifyengineermark as ground-truth verified (sets gt_verified_by, gt_verified_at)
GET/api/v1/exportsviewerlist available exports
POST/api/v1/exportsengineertrigger a new export run (body: {module_id, date})
GET/api/v1/exports/{filename}viewerdownload an .xlsx or .csv
GET/api/v1/operatorsengineerlist operators
POST/api/v1/operatorsengineercreate operator (manual entry; Confidencial)
GET/api/v1/cv-model-versionsviewerlist model versions deployed on this Jetson
POST/api/v1/auth/local/loginnonelocal username/password login (fallback when neither Auth0 nor Tailscale is available)
POST/api/v1/auth/local/logoutviewerclear session cookie
GET/api/v1/meviewercurrent user identity + role

The CV writer process opens its own SQLite connection (WAL mode) and writes cycle_events rows directly. There is no /api/v1/internal/cycles endpoint. SQLite WAL gives us one writer + many readers without an HTTP hop in the hottest loop.

  • The dashboard API process opens the same DB read-only.
  • Schema migrations are applied by the dashboard process on startup; the CV writer waits for schema_version ≥ its required minimum before starting writes.
  • CV writer heartbeat is a row in a telemetry table the dashboard reads, not an inbound HTTP call.

(An internal HTTP cycle-write endpoint was considered and parked to docs/design/60-parking/websocket-live-updates.md’s neighborhood — see ml/cycle-event-detection.md for the canonical CV-side spec.)

# response: GET /api/v1/health
class HealthOut(BaseModel):
status: Literal["ok", "degraded", "down"]
db_reachable: bool
db_schema_version: int
db_wal_size_bytes: int
last_event_ts: datetime | None # is the CV writer alive?
cameras_online: int
cameras_total: int
disk_free_pct: float
uptime_seconds: int
# response: GET /api/v1/workstations/{ws_id}
class WorkstationOut(BaseModel):
id: int
module_id: int
code: str # 'PUESTO_01'
display_name: str # 'Puesto 1'
operation: str | None # 'PESPUNTAR CUELLO'
sequence_pos: int | None
current_cycle: CycleEventOut | None
last_cycles: list[CycleEventOut] # last 10
efficiency_24h: float | None
sam_seconds: float | None
class CycleEventOut(BaseModel):
id: int
workstation_id: int
operator_id: int | None
start_ts: datetime # serialized as ISO 8601 UTC
end_ts: datetime | None
duration_seconds: float | None
status: Literal["in_progress","complete","aborted","manual_override"]
confidence: float | None
cv_model_version: str # 'yolov8n-coco/8.1.30-trt'
source_frame_url: str | None # /static/frames/... or null if archived
notes: str | None
# response: GET /api/v1/efficiency/module/{module_id}?from=...&to=...&granularity=hour
class ModuleEfficiencyOut(BaseModel):
module_id: int
from_ts: datetime
to_ts: datetime
granularity: Literal["minute","hour","shift","day"]
series: list[EfficiencyPoint]
summary: EfficiencySummary
class EfficiencyPoint(BaseModel):
ts: datetime
units_produced: int
units_target: int
efficiency_pct: float
avg_cycle_seconds: float
// GET /api/v1/health
{
"status": "ok",
"db_reachable": true,
"db_schema_version": 1,
"db_wal_size_bytes": 4194304,
"last_event_ts": "2026-07-15T13:24:53Z",
"cameras_online": 6,
"cameras_total": 6,
"disk_free_pct": 71.4,
"uptime_seconds": 144203
}
// GET /api/v1/workstations/1
{
"id": 1,
"module_id": 1,
"code": "PUESTO_01",
"display_name": "Puesto 1",
"operation": "PESPUNTAR CUELLO",
"sequence_pos": 1,
"current_cycle": {
"id": 47891,
"workstation_id": 1,
"operator_id": 7,
"start_ts": "2026-07-15T13:24:11Z",
"end_ts": null,
"duration_seconds": null,
"status": "in_progress",
"confidence": 0.91,
"cv_model_version": "yolov8n-coco/8.1.30-trt",
"source_frame_url": "/static/frames/2026/07/15/cam1_47891.jpg",
"notes": null
},
"last_cycles": [/* 10 most recent completed cycles */],
"efficiency_24h": 0.612,
"sam_seconds": 22.8
}

The frontend polls GET /api/v1/cycles?after_id=<last_seen> every 5 s while a tab is foregrounded. Per-station status comes from GET /api/v1/workstations/{ws_id}. Two cameras × ~one cycle/min/station is trivial polling load on SQLite.

WS / push channel live-channel design is preserved in docs/design/60-parking/websocket-live-updates.md for Phase II, including reconnect/replay semantics. Phase I does not ship it.

All errors return JSON:

{
"error": {
"code": "CYCLE_NOT_FOUND",
"message": "No cycle_events row with id=99999",
"request_id": "01HVK..."
}
}
StatusWhen
200success
201resource created (POST /exports)
204success, no body (PATCH internal)
400malformed request / validation error
401no valid identity
403identity OK, role insufficient
404resource missing
409conflict (e.g., trying to override an already-overridden event without force=true)
422pydantic validation error (FastAPI default)
429rate-limited (exports endpoint: max 1 in flight per user)
500unhandled — never desired
503DB unreachable / WAL corruption suspected — health check failing

Every response carries X-Request-Id for correlation with audit_log and journalctl.

EndpointLimit
POST /api/v1/exports1 in flight per user, 12/hour
POST /api/v1/auth/local/login5 attempts / 15 min / source IP
everything else60 req/s / user (very lenient; SQLite is the bottleneck not the API)

Implemented with slowapi.

Access-Control-Allow-Origin: https://lbzfai.com plus https://*.ts.net (Tailscale Funnel hostnames). No *.

/docs and /redoc available, but locked to admin role in production; opened to all in --dev mode for the demo.

  • Every request logged structured JSON: {ts, request_id, method, path, status, latency_ms, user_email, user_role}
  • Journald is the sink (systemctl status lbzf-dashboard; journalctl -u lbzf-dashboard -f)
  • /api/v1/health returns db_wal_size_bytes so we can alert on WAL bloat
  • Phase II: ship logs to CloudWatch via Tailscale; Phase I just lives in journald
  • WS / push channel push instead of REST polling for Phase I — works, but adds a meaningful new surface (reconnect logic, subscribe filters, Cloudflare-Worker WS proxy unknowns) for a load profile that REST polling handles trivially. Parked to docs/design/60-parking/websocket-live-updates.md for Phase II when behavioral overlays may want sub-second push.
  • Internal HTTP cycle-write endpoint (POST /api/v1/internal/cycles) — considered. Adds an HTTP hop in the hottest CV loop. Direct SQLite-with-WAL gives the same architectural cleanliness (one writer process) without the hop. See ml/cycle-event-detection.md for the writer-side contract.
  • Flask + flask-sock — works, but pydantic + FastAPI + auto-OpenAPI is the better full stack for the small per-developer-day tradeoff.
  • Django + Channels — overkill; we don’t need an ORM or admin
  • gRPC — wrong layer for a browser client; lovely for CV-writer → API but the writer can use SQLite directly.
  • Auth0 SPA token directly to Jetson — possible but tying the JWT audience to a Jetson hostname is awkward across Tailscale’s dynamic DNS. Cleaner to terminate Auth0 in the cloud proxy and forward an X-LBZF-Identity header to the Jetson.
  • No local-auth fallback — risky; if Tailscale is down and Auth0 is down we have no way to log in to diagnose. The /auth/local route gives Sophia a break-glass option (long random password kept in 1Password).
  • OPEN: Tailscale Funnel vs Tailscale Serve for HTTPS / identity — owner: Sophia / Andrew. Funnel exposes to public internet (we don’t want that), Serve is internal-only (what we want). Confirm Tailscale plan tier supports Serve with identity headers.
  • OPEN: role list and email→role mapping — owner: Agent C (user/org bucket). This doc assumes the mapping in user-org-and-auth.md; if Agent C changes it, this table must follow.
  • OPEN: pagination convention — keyset (recommended for cycle_events which is append-only) vs offset. This doc assumes keyset (?after_id=) but doesn’t specify the cursor format.
  • OPEN: which endpoints are safe to expose on lbzfai.com vs Tailscale-only? Cycles, efficiency = OK; operator names, override = Tailscale-only. Owner: Agent E + Mariana.
  • OPEN: 4xx error message localization — Spanish for Ronald? English-only is fine for Phase I but flag for Phase II.
  • OPEN: do we support API tokens (long-lived) for the CSV/Excel automations Ronald might script in his own time? Phase II.
This doc depends onOwner bucketWhat we need
DB schema (every endpoint reads from it)This bucket (data-model.md)DDL finalized
Auth identity flows (Auth0 + Tailscale)Frontend (Agent C) + lbzfai-jetson-integrationIdentity proxy decisions
User role definitionsFrontend (Agent C) — user-org-and-auth.mdEmail→role table
CV writer writes directly to SQLite (not through HTTP)ML (Agent D)cycle-event-detection.md owns the writer contract; this API reads only
Excel exporter triggerThis bucket (excel-export.md)POST /exports calls into it
This doc impliesOwnerAsk
Frontend polls REST every 5 s for live state (WS parked)Agent CImplement after-cursor polling + visibility-pause
ML writer is the only SQLite writerAgent DDashboard API opens DB read-only
Cloud proxy injects X-LBZF-Identitythis bucket (lbzfai-jetson-integration.md)Build the proxy (Tailscale-only Phase I per ADR-005)
  1. Direct-SQLite from the CV writer means schema migrations must be coordinated. The writer must wait for schema_version >= required_min at startup before writes; if migrations are slow the writer is blocked. The dashboard API process owns migrations. Documented in data-model.md.
  2. The auth model has three mechanisms (Tailscale header, Auth0 JWT, local session) plus the CV writer’s local file access. Each is reasonable in isolation, but the middleware that composes them is the single most error-prone surface in the whole backend. Needs a written test suite, not just hand-waving.
  3. 5 s REST polling is OK at Phase I scale but not at Phase III. Two cameras → twenty cameras at 5 s/poll = 4×/s read of /cycles?after_id=. SQLite handles it, but Phase II should reconsider WS push from the parked design.
  4. Role names locked to admin / engineer / executive for Phase I. If Agent C ships divergent names, every endpoint’s @require_role("executive") is wrong; the canonical list lives in user-org-and-auth.md.
  5. No explicit answer to “what happens when SQLite is locked for 6+ seconds because the exporter is doing a big read.” The busy_timeout=5000 in data-model.md says we’ll 503 after 5s. That’s correct but Ronald clicking “export” shouldn’t degrade the live dashboard for everyone else — the exporter should use a read-only connection on a snapshot. Mentioned but not specced.
  • Now (May 2026): scaffold app/api/v1/ with FastAPI; implement /health, /version, /modules, /workstations, /cycles against the seed DB; auth middleware stubbed (X-Test-User header) for local dev.
  • Before Argentina (2026-05-15): demo REST polling against a synthetic event generator on the Jetson over Tailscale. ITBA gets the same scaffolding.
  • Before Pereira (Jul 2026): all viewer + engineer endpoints live; Auth0 JWT verification path tested end-to-end via Tailscale Serve identity header.
  • Day-1 Pereira: local-auth break-glass tested; Mariana able to view via Tailscale on her phone; Ronald able to trigger an export.
  • Phase II: WS / push channel live channel (per parked design); per-operator endpoints; OpenAPI client generation for ITBA’s analysis scripts.

The dashboard API unblocks: frontend (cannot render anything without it), exporter (uses POST /exports).