Workstation ROI calibration

Bucket: technical/ml (Agent D) · Status: reviewed · Owner: Sophia Mann · Phase: I · Last updated: 2026-05-12

Context

The cycle-event detector (cycle-event-detection.md) requires, per camera, a set of polygonal regions of interest (ROIs) — one per workstation that camera sees — so that each person-bbox can be associated with a workstation. Without ROIs the system has no notion of which workstation an operator is at.

ROIs are physical-floor-plan-dependent: camera mount height, tilt, lens FoV, machine layout. The Pereira install (Jul 2026) and the ITBA twin install (~2026-05-15) are physically different facilities, and the calibration tool has to work for non-Sophia operators (Ronald, Armando, ITBA team) without ML expertise.

A camera bump (cleaning crew, a passing forklift, the dome wall-mount slowly loosening over 6 months) silently invalidates all the ROIs on that camera. The system must detect that and re-prompt.

Goals

G1: A non-ML user can calibrate a single camera’s ROIs in ≤ 5 minutes end-to-end.
G2: Calibration state survives Jetson reboot, Jetson power loss, GStreamer pipeline restart, and Docker container redeploy.
G3: System auto-detects “camera has moved” with ≥ 90% recall at a shift of ≥ 15 pixels (out of 640) median displacement, and ≤ 1 false alarm per camera per week.
G4: ROI versions are foreign-keyed from cycle_events so historical events can be re-interpreted against the ROI configuration that was active at the time. (Coordinated with Agent B.)

Non-goals

Auto-discovery of workstations from raw video (Phase III stretch; too unreliable in Phase I when we have ≤ 6 cameras and a human operator on-site).
3-D camera calibration with intrinsics/extrinsics. Phase I uses 2-D polygon ROIs in pixel space; no homography to floor-plane needed for cycle counting.
Multi-camera fusion (a single workstation seen by two cameras with stitched ROIs). Phase II topic.
Camera mount mechanical design — Agent C bucket.

Proposed approach

1. Calibration data model

CREATE TABLE rois (
  id INTEGER PRIMARY KEY,
  camera_id INTEGER NOT NULL,             -- FK to cameras
  workstation_id INTEGER NOT NULL,        -- FK to workstations
  polygon_json TEXT NOT NULL,             -- JSON array of {x,y} in pixel space, 0..640 × 0..480
  ref_frame_path TEXT NOT NULL,           -- path on disk to a reference frame at calibration time
  ref_homography_features BLOB,           -- ORB/AKAZE keypoint+descriptor blob for drift check (§3)
  created_at TIMESTAMP NOT NULL,
  created_by TEXT NOT NULL,               -- 'ronald', 'sophia', 'itba-rmarino', etc.
  notes TEXT,
  superseded_by INTEGER REFERENCES rois(id)  -- null when active
);
CREATE INDEX idx_rois_active ON rois(camera_id, workstation_id) WHERE superseded_by IS NULL;

A workstation has exactly one active ROI per camera at a time. ROIs are append-only: re-calibration creates a new row and sets superseded_by on the old one. cycle_events.roi_version (Agent B) references rois.id.

2. Calibration UI

A small browser-based tool served from the same Jetson host as the dashboard, accessible via Tailscale:

/calibrate/<camera_id>

Workflow:

The tool freezes a current frame from the camera at 640×480 (the same resolution the detector sees) and displays it.
The user clicks “Add workstation ROI”, selects an existing workstations row from a dropdown (or creates a new one), and clicks points on the frozen frame to draw a polygon. 4–8 points is typical (a tilted-rectangle around the chair-and-machine).
Save → new rois row, the frozen frame is stored as ref_frame_path, ORB keypoints are extracted from it and persisted to ref_homography_features.
Live-preview overlay shows the ROI on the live stream so the user can spot-check.

The tool runs entirely client-side except for the save endpoint. Spanish-language UI is required — Ronald is the calibration operator. Coordinate strings with Agent A.

3. Drift / mount-shift detection

Once per hour (cron’d in the Jetson), for each camera:

1. Grab a current frame.
2. Run ORB keypoint matching against the reference frame stored for that camera's
   most recently-calibrated ROI.
3. Estimate a 2-D affine transform via RANSAC. Compute the median pixel
   displacement of matched keypoints.
4. If median_displacement > 15 px (≈ 2.3% of 640-px width):
     - WARN: log + dashboard banner "Camera N may have moved — please re-calibrate"
     - The system continues to use the old ROIs (don't silently invalidate)
   If median_displacement > 40 px (≈ 6%):
     - HARD: cycle events from this camera are flagged quality=red, dashboard
       displays a recalibration-required modal, Excel export annotates the
       affected rows.

The 15 px / 40 px thresholds are starting points; tune during the validation window.

Edge case: if matched-keypoint count drops below 20 (camera is heavily occluded, lens fogged, lighting dramatically different), drift cannot be computed reliably — the system logs drift_unknown and asks for re-calibration without claiming the mount moved.

4. Re-calibration cadence (planned vs reactive)

Day-1 install: full ROI calibration for all cameras (≤ 30 min for 6 cameras).
Reactive: whenever drift detection trips (§3).
Planned: monthly walk-through during the first 3 months (“does the dashboard’s ROI overlay still look right?”). Reduce to quarterly after stable.
After any reported camera-bump event (cleaning, conduit work, layout change): re-calibrate that camera before the next shift.

5. ROI shape guidance

The ROI polygon should be drawn around the seated upper torso + head envelope of the operator at the workstation, not the full body. Rationale:

Operators lean forward over fabric — feet/legs leave a tight bbox; head/chest stay in frame.
A larger ROI invites supervisor-walking-past false starts.
70–80% of the seated body height, centered on the chair, with a ~20% horizontal margin.

A calibration guide image in the tool shows this pattern for the existing Angela operators’ camera angles (one-time work; uses frames from “Shared by Ronald 4/15” videos as reference).

6. ITBA-specific concerns

ITBA’s twin install will face the same calibration problem on different garments (their workshop is not Ref22 dress shirts). Workflow:

ITBA gets the same calibration tool, Spanish + English UI (or English-first for the BA install).
ROI database is per-install — the schema is identical but the rows differ. No need to sync.
We share a short “calibration playbook” PDF in docs/design/technical/ml/ (out-of-scope-for-this-doc artifact; produce before 2026-05-15).

7. Calibration tool implementation sketch

Stack: simple HTML + vanilla JS (canvas-based polygon drawing) + a Flask/FastAPI endpoint on the Jetson. No build step. Total: ~300 lines. The choice is deliberate — calibration tooling should be the most boring code in the project, because non-developers maintain it.

# pseudo
@app.post("/calibrate/<cam_id>")
def save_roi(cam_id):
    body = request.json
    # body: { workstation_id: int, polygon: [{x,y}, ...], notes: str }
    frame = capture_one_frame(cam_id, 640, 480)
    ref_path = save_frame(frame, cam_id)
    features = extract_orb(frame)
    db.execute("UPDATE rois SET superseded_by=? WHERE camera_id=? AND workstation_id=? AND superseded_by IS NULL",
               (None, cam_id, body["workstation_id"]))
    db.execute("INSERT INTO rois(camera_id, workstation_id, polygon_json, ref_frame_path, ref_homography_features, created_at, created_by, notes) VALUES (?,?,?,?,?,?,?,?)",
               (cam_id, body["workstation_id"], json.dumps(body["polygon"]), ref_path, features, now(), current_user(), body.get("notes", "")))

Alternatives considered

Alt	Why rejected
Auto-ROI discovery — cluster bbox centroids over a 1-hour warmup, fit polygons.	Brittle when workstations have similar geometries; needs ≥ 1 hour of data before the system works at all; gives Ronald no override. Reconsider for Phase III with many cameras.
Floor-plane homography + world coordinates — map detections to a known floor plane, attach workstations to physical (x,y) coordinates.	Useful if cameras overlap or workstations move. Phase I has no overlap and a fixed layout — overkill. Phase II/III revisit.
Hand-edit a YAML config file instead of a UI.	Faster to ship, but excludes Ronald entirely. Calibration is a recurring operation, not a one-time install — UI pays for itself.
AprilTag fiducials on each workstation to anchor ROIs.	Real and decent (auto-recovers from camera shifts) but requires physical tags on every workstation and risks looking like surveillance theatre to operators. Out of Phase I.
Pose-based “where do operators sit?” auto-fit.	Same downsides as bbox-clustering auto-discovery, plus more compute. Phase III.

Open questions

OPEN[Andrew, by 2026-05-20]: Does Form AI have a battle-tested calibration UI we can port the UX from? Don’t reinvent.
OPEN[Ronald via Armando, by 2026-06-15]: Camera-mount stability — has the existing LBZF security CCTV ever physically shifted? If yes, drift detection cadence should be daily, not hourly.
OPEN[ITBA, by 2026-05-25]: Will ITBA’s twin install use the same calibration UI, or are they writing one? (Strong preference: share the tool. Divergence here means two unmaintained calibration UIs by Phase II.)
OPEN[Sophia, by 2026-06-01]: Which camera serves which workstations? At 6 cameras for 21 Angela workstations, the original “one per workstation” plan is over-counted; ROIs-per-camera is the right level of abstraction. Confirm the actual camera-to-station mapping before the trip.
OPEN[Mariana, before Pereira install]: Will LBZF object to the system retaining a “reference frame” per camera as a calibration artifact? The reference frame contains the workstation’s empty geometry but may incidentally show passing operators. Coordinate retention with Agent E.

Cross-bucket dependencies

Agent A (frontend): dashboard must show ROI overlays in live-camera tiles (transparent polygon over the stream) so Ronald can sanity-check at a glance. Spec the SVG/canvas overlay.
Agent B (backend): rois table schema + cycle_events.roi_version FK; calibration endpoints; ORB feature blob storage.
Hardware: mount cameras with cable-tension management so cables don’t tug the dome out of alignment. Verify the Amcrest IP8M-2779EW-AI mount torque spec.
Agent E (business/legal): per §6, retention of reference frames and per-hour drift-check frames is a personal-data adjacent question.

What’s weak in this doc

The 15 px / 40 px drift thresholds are pure conjecture. We have no empirical data on how stable an Amcrest dome mount actually is under daily plant operation. The numbers are reasonable-on-the-surface but a reviewer will ask “show me the displacement-vs-time plot from your validation window.” We do not have that yet — we will during the validation window. Until then, expect to revise these thresholds.
ORB keypoint matching may break under genuine ROI lighting changes that are not mount shifts. Sunset rolling across the floor, fluorescent fixture going out, a new poster on the wall — all of these will drop matched-keypoint counts without the camera moving. The ”≥ 20 matched keypoints” guard helps but isn’t sufficient; a robust system would track keypoint count over time and require displacement to be elevated, not just match count depressed. Spec could be tightened.
Re-calibration cadence is policy, not engineered. “Monthly walk-through” requires Ronald to remember; the system does not nudge. A real product would surface a “last calibrated 47 days ago” badge in the dashboard and prompt at the 30-day mark. This is straightforward to add but is not specified here.
No story for “operator moved their chair 30 cm to the left permanently.” ROI still works (operator’s torso is still inside the polygon margin) but the system has no way to know whether the ROI is now mis-aligned-but-still-functional or genuinely mis-aligned. A “quiet drift” failure mode worth documenting.
The calibration UI shipping spec assumes Spanish-speaking Ronald uses it. But the open question of whether Ronald uses a desktop browser or a phone browser is unanswered. The tool needs to be at minimum tablet-responsive — Ronald may calibrate from his phone on the floor.

Rollout

Date	Gate
2026-05-20	Calibration UI prototype runs locally against a recorded video file (mocking RTSP).
2026-06-01	UI runs on Jetson + live Amcrest RTSP; polygons persist; reference frames saved.
2026-06-15	Drift detector running hourly in background, surfacing warnings to logs (no UI yet).
2026-07-01	Pereira day-1: Ronald (with Sophia / Armando assist) calibrates all 6 cameras. Time budget: 30 min.
2026-07-08	First “did the camera move?” drift warning hits Ronald (or it doesn’t — both outcomes are data).

Paper alignment

Methods: ROI calibration procedure, polygon shape rationale — Section 3.1 (“System Setup and Calibration”).
Experimental setup: per-camera ROI overlay figures (anonymized — operator’s face blurred) — Fig. 2.
Discussion: drift-detection effectiveness over the validation window — short paragraph in Section 6.
Replicability angle: the calibration tool itself is a paper-worthy artifact for a small-factory CV deployment (“here’s a 5-min calibration workflow non-ML users can run”). Worth a paragraph in Discussion.