Deployment + CI/CD spec — two targets, two stories
Bucket: Backend (Agent B) Status: Reviewed — 2026-05-12 (Phase B seams applied: R2 release manifest auto-update parked; Phase I Jetson updates are SSH push) Owner: Sophia · Reviewers: Andrew, Agent A (Jetson side) Supersedes / refines:
docs/technical/deployment.mdis on-site procedure; this doc is the CI/CD plumbing layer above it.
Context
Section titled “Context”Two completely different deploy targets, often conflated:
- lbzfai.com — Cloudflare auto-deploys on push to
mainvia its native GitHub integration. A scoped Cloudflare API token now lives in.envso agents can also trigger a deploy directly. - The Jetson — sits on LBZF’s network in Pereira; auto-deploy story has to handle “what if the deploy bricks the box and Sophia is in California?”
The decision being made: the safe blast-radius pattern for agent-driven Cloudflare deploys, and how the Jetson updates without a Sophia-flies-to-Colombia recovery scenario.
- An agent can deploy a website change with the minimum required token scope and a clear blast-radius preview
- Jetson updates work without a physical visit, with a rollback path that doesn’t require Tailscale (in case the update breaks Tailscale)
- The two deploy targets are clearly named, clearly scoped, never confused for each other
- The CI loop covers test, build, deploy, smoke-check
- Secrets never end up in commits or in agent transcripts
Non-goals
Section titled “Non-goals”- Multi-environment (staging vs prod) for the Jetson — Phase I has one Jetson; staging is “Sophia’s dev Jetson in CA before flight”
- Blue/green deploys; canarying — overkill at our scale
- IaC for the Cloudflare account itself (Terraform / Pulumi) — Cloudflare dashboard is fine until we have >5 routes
- Container orchestration on the Jetson — systemd is sufficient
Proposed approach
Section titled “Proposed approach”Target 1: lbzfai.com (Cloudflare Workers Static Assets)
Section titled “Target 1: lbzfai.com (Cloudflare Workers Static Assets)”How it deploys today
Section titled “How it deploys today”- Push to
mainongithub.com/sophiamann/lbzfai - Cloudflare’s GitHub integration receives a webhook
- Cloudflare build VM clones the repo
- Auto-runs
astro add cloudflare(build VM only — does NOT touch the repo’sastro.config.mjs) - Runs
npm run build→ staticdist/ - Deploys to the
lbzfai-comWorker (andlbzfai.sophiainesmann.workers.devroute) - Cache flush is automatic
- Build vars (
PUBLIC_AUTH0_DOMAIN,PUBLIC_AUTH0_CLIENT_ID) come from Cloudflare dashboard → Settings → Builds → Build variables
This works and shouldn’t be changed casually.
Agent-driven deploys (the new path)
Section titled “Agent-driven deploys (the new path)”A scoped Cloudflare API token is in .env. Token scope (recommended; verify in dashboard):
| Scope | Why |
|---|---|
Account → Workers Scripts → Edit | publish the Worker |
Account → Workers Routes → Edit | add/move routes |
Account → Workers KV Storage → Edit | (Phase II) write user table once it moves to KV/D1 |
User → User Details → Read | API health check |
Zone scope: only lbzfai.com | blast-radius |
Do NOT grant Account → Account Settings → Edit, Account → DNS → Edit (for new zones), or User → API Tokens → Edit — those are nuclear.
Smallest-blast-radius agent pattern
Section titled “Smallest-blast-radius agent pattern”Three rules:
-
Agents do not push to
main. Agents open a PR from a worktree branch. The auto-deploy fires only after a human merges. Cloudflare’s GitHub integration is the deploy mechanism; agents are just authors. -
For preview deploys, agents use
wrangler deploy --env preview(or equivalent Worker route likepr-<n>.lbzfai-com.preview). Previews are namespaced and do not touchlbzfai.com. The Cloudflare API token used by agents is preview-scoped if possible; if Cloudflare doesn’t support a preview-only token, agents call awrangler deploy --dry-runto surface the intended change before any actual write. -
The
.envCloudflare token is dev-only. It is in.gitignoreand exists in.env.exampleonly as a placeholder. Token rotation is manual; rotate quarterly and after every contractor offboard.
Smallest blast-radius deploy pattern, in order of preference:
| Action | Affected surface | Risk |
|---|---|---|
| Open PR; let CI build a preview Worker | pr-N.lbzfai-com.preview.workers.dev | None to prod |
wrangler deploy --dry-run to validate config | nothing changes | None |
wrangler deploy --env preview | preview Worker only | None to prod |
| Push to a non-main branch | nothing deploys (GitHub integration is main-only) | None |
Merge to main (after PR review) | lbzfai.com prod | Owned by reviewer |
Direct wrangler deploy --env production (skipping git) | lbzfai.com prod | Forbidden for agents — emergency only, Sophia/Andrew |
Agent checklist for any Cloudflare-touching change
Section titled “Agent checklist for any Cloudflare-touching change”- Run
wrangler whoamito confirm the token is the scoped agent token, not a personal token - If editing
wrangler.tomlorastro.config.mjs, dry-run the build locally first (npm run build) - Open PR; do not push to main
- Include the URL of the preview deploy in the PR description
- Note any new Cloudflare resources (routes, secrets) so they’re caught at review
Target 2: The Jetson
Section titled “Target 2: The Jetson”The problem statement
Section titled “The problem statement”The Jetson is in Pereira. The only inbound channel is Tailscale. If a deploy breaks Tailscale, recovery requires Armando or a local technical contact physically rebooting. This is the operational gating constraint on Jetson CI/CD.
Phase I: SSH push via Tailscale
Section titled “Phase I: SSH push via Tailscale”Phase I uses Option A (SSH push) for Jetson updates. The Watchtower-style R2-manifest auto-pull (Option C) is parked to docs/design/60-parking/r2-release-manifest.md.
Reasoning:
- Phase I has one production Jetson + one ITBA twin and one release author (Sophia). The R2 manifest design pays off when there is a fleet to coordinate or a non-author operator; neither holds in Phase I.
- SSH push is the simplest mechanism that works and is fully reversible:
ssh sophia@<jetson> "cd /opt/lbzf && git pull && ./bin/migrate.py && systemctl restart lbzf-*". - Rollback is
git checkout <prev_sha> && systemctl restart lbzf-*over the same SSH session. - The ITBA Jetson uses the same workflow over its own Tailscale tag (
tag:itba-dev).
Trigger to thaw the R2 design: more than one production Jetson or a non-Sophia engineer regularly merging to main.
Tailscale-breaking changes (still applies)
Section titled “Tailscale-breaking changes (still applies)”The single failure that defeats any unattended update path: a release that breaks Tailscale itself. Mitigations:
tailscaledis pinned and updated separately fromlbzf-*.- The systemd unit
tailscaledhasRestart=alwaysand is independent oflbzf-*units. - Tested in CA: Sophia does a “deliberately break Tailscale” rehearsal once before the Pereira flight to ensure the local technical contact at LBZF has the runbook (out-of-band: phone Armando, who walks LBZF staff through the reboot).
Rollback (Phase I)
Section titled “Rollback (Phase I)”- SSH via Tailscale:
ssh sophia@<jetson> "cd /opt/lbzf && git checkout <prev_sha> && systemctl restart lbzf-*". - If Tailscale is broken, the runbook escalates to Armando → LBZF on-site reboot.
Rollback is always faster than diagnose; we rollback first, debug later.
Secrets management
Section titled “Secrets management”| Secret | Where | Rotation |
|---|---|---|
| Cloudflare API token (agent) | .env (dev), Cloudflare Worker secret (prod) | Quarterly + on offboard |
| Auth0 tenant config (public client ID, domain) | Repo + Cloudflare build vars | Never (it’s public) |
JETSON_PROXY_SECRET (Worker → Jetson) | Cloudflare Worker secret + /etc/lbzf/proxy.token on Jetson | Quarterly |
| Internal HMAC for CV writer → API | /etc/lbzf/internal.token on Jetson | On model deploy |
| SQLite encryption key (if we add SQLCipher) | Not in Phase I | n/a |
| Camera RTSP passwords | /etc/lbzf/cameras.yaml (mode 0600) on Jetson | When cameras change |
| Tailscale auth key | Per-machine; not stored after enrollment | n/a |
| AWS credentials | Per Andrew’s “AWS day 1” — not yet provisioned | TBD |
Rule: no secret in the repo, ever. .env.example shows the keys; .env is .gitignored. Pre-commit hook scans for AKIA / sk_live_ / etc. patterns (gitleaks or hand-rolled).
CI matrix (Phase I)
Section titled “CI matrix (Phase I)”| Workflow | Trigger | Target | Job |
|---|---|---|---|
web-build.yml | push to any branch | lbzfai.com Cloudflare | (handled by Cloudflare native integration, not GH Actions) |
web-preview.yml (new) | PR | preview Worker | wrangler deploy --env preview-pr-<n> |
lint.yml (new) | every push | none | ruff, mypy, eslint |
pytest.yml (new) | push to main if app/** changed | none (test results only) | lint, mypy, pytest — but does not publish artifacts |
jetson-build.yml and jetson-promote.yml are parked alongside the R2 manifest design. Phase II re-introduces them when fleet size justifies.
GH Actions secrets (Phase I):
CLOUDFLARE_API_TOKEN_CI— scoped only toWorkers Scripts → Editfor the preview worker. No prod write, no R2 (since no manifest in Phase I).CLOUDFLARE_ACCOUNT_ID— public-ish, but in secret store for tidiness.
Observability for deploys
Section titled “Observability for deploys”- Every Worker deploy logs a structured line
{deploy_id, sha, user, ts}to Cloudflare Logs (free; 7-day retention). - Every SSH-driven Jetson update is a manual operation. Sophia logs the git SHA range she deployed in a
deploy-log.md(or justgit log --oneline <prev>..<new>) and pastes it into the project Slack/Discord channel. - Phase II adds structured deploy logs when the R2-manifest auto-pull thaws.
Argentina ITBA twin Jetson
Section titled “Argentina ITBA twin Jetson”ITBA’s Jetson uses the same SSH-push workflow as LBZF’s, scoped to the tag:itba-dev tailnet tag. Configuration that differs (camera RTSP URLs, org_id, Tailscale auth) lives in /etc/lbzf/site.yaml and is not in the git repo — it’s per-machine.
When ITBA wants to diverge (try a different CV model, etc.), they branch from main locally on their Jetson and pin to that branch.
Alternatives considered
Section titled “Alternatives considered”- GitHub Actions deploys directly to Cloudflare instead of Cloudflare’s native GitHub integration: works, but duplicates the existing wiring. Native integration handles the build VM, secrets, and route configuration; reimplementing in Actions is a regression.
- R2 release manifest + Watchtower-style auto-pull on the Jetson — the right design at fleet scale. Parked to
60-parking/r2-release-manifest.mduntil fleet exists or a non-author engineer ships. - systemd-timer
git pullon the Jetson — every commit tomainships immediately. Fast but no decoupling. Sophia explicitly opts into SSH-push instead so “merged ≠ deployed” stays true. - Container image deploys (Docker on Jetson) — adds Docker to the Jetson stack we don’t otherwise need. JetPack + apt + systemd is the simpler base; revisit in Phase II if we get tired of dependency drift.
- Ansible / Salt / cloud-init for Jetson provisioning — overkill for one box. A bash script + this doc is fine.
- Cloudflare Pages instead of Workers Static Assets — we’re already on Workers Static Assets (per
CLAUDE.md); the README is stale on this. Migrating is not a Phase I priority.
Open questions
Section titled “Open questions”- PARKED: Cloudflare R2 cost at our scale — moot for Phase I; revisit when R2 manifest thaws.
- OPEN: agent push permission scope — currently
.envhas the Cloudflare token; should an agent ever directly call wrangler against prod, or only against preview? Default: preview only. Owner: Sophia. - OPEN: pre-commit secret scanner —
gitleaksis the standard; install + add to CI. Owner: Sophia. - OPEN: how do migrations roll back? If
0.4.7adds a column,0.4.6should still work; if0.4.7removes a column, downgrade silently breaks. Defaulting to additive-only migrations in Phase I; document the policy.
Cross-bucket dependencies
Section titled “Cross-bucket dependencies”| This doc depends on | Owner bucket | What we need |
|---|---|---|
| Tailscale on Jetson | Hardware (Agent A) + existing deployment.md | The always-on Tailscale daemon is the rollback escape hatch |
| Cloudflare Worker route config | This bucket (lbzfai-jetson-integration.md) | We deploy the Worker |
systemd unit files for lbzf-dashboard, lbzf-cv-writer, lbzf-exporter | this bucket + ML (Agent D for cv-writer) | All units co-evolve. lbzf-updater ships only when the R2-manifest design thaws. |
| DB migration policy | This bucket (data-model.md) | Additive-only by default |
| This doc implies | Owner | Ask |
|---|---|---|
| ML/CV writer is part of the same git repo on the Jetson | Agent D | A single git pull updates all Jetson processes |
| Frontend deploys are decoupled from backend Jetson deploys | Frontend (Agent C) | They can ship UI changes without waiting for a Jetson SSH push |
What’s weak in this doc
Section titled “What’s weak in this doc”- SSH-push depends on Sophia being available. A 3-day Sophia outage (sickness, travel without internet) means no Jetson updates. Phase II thaws the R2-manifest auto-pull design specifically to remove this single-person bottleneck.
- Rollback rehearsal is mentioned but not scheduled. Without a documented dry-run on the dev Jetson, the first actual rollback in production is going to be exciting.
.envCloudflare token + agents — a clever attacker who gets read access to.env(e.g., a carelesscat .envin an agent transcript) has Worker write access. The blast radius is the lbzfai.com Worker, not the AWS account, not Auth0, not the Jetson. Acceptable but worth being clear-eyed about.- No security scan in CI yet (
pip-audit,npm audit,gitleaks). Should be table stakes. - No structured deploy audit trail in Phase I. Sophia’s hand-written “I deployed
<sha>at<time>” notes are the only record. Acceptable at this scale; the R2-manifest thaw replaces this with proper structured logs.
Rollout
Section titled “Rollout”- Now (May 2026):
- Document the scoped Cloudflare token in
.env.example - Write
web-preview.ymlGH Action for PR previews - Add
gitleakspre-commit
- Document the scoped Cloudflare token in
- Before Argentina (2026-05-15):
- Demo a PR-driven preview deploy of lbzfai.com
- ITBA gets the SSH-push workflow on the BA handoff Jetson
- Before Pereira (Jul 2026):
- SSH-push runbook tested end-to-end on the dev Jetson in CA, including a forced rollback
- Runbook for local technical contact: “if the dashboard is down, call Armando, who phones Sophia”
- Day-1 Pereira:
- Deploy a known-good
mainSHA before flight; freeze the SHA during the first week of operation - First in-prod release is the bug-fix release that comes out of the deploy week
- Deploy a known-good
- Phase II:
- Thaw
60-parking/r2-release-manifest.md: build R2 bucket + manifest +lbzf-updater.service - Canary on ITBA’s Jetson first; promote to LBZF after 72h of green
- Auto-rollback richer signals (cycle_events count drops to zero for 5 min → automatic rollback)
- Thaw
Unblocks: every other backend doc’s path to running on the Jetson; safer agent-driven website iteration; ITBA’s parallel work.