Excel export spec — `INDICADORES` drop-in

Bucket: Backend (Agent B) Status: Draft v0.1 — 2026-05-10 Owner: Sophia · Reviewers: Ronald (drop-in compatibility), Andrew, Agent E Source-of-truth files we are mirroring: INDICADORES ABRIL.xlsx, Ref22 Slim - Angela.xlsx (“Balanceo” tab)

Context

Ronald Gonzalez’s adoption of the CV system is conditional on the Excel export looking exactly like the spreadsheet he produces today. If he has to retype rows into his existing template, the system is a net negative for him. This doc pins the export format, the cadence, the delivery channel, and the es-CO locale rules, even though we currently do not have direct read access to INDICADORES ABRIL.xlsx in the repo — assumptions are documented explicitly so they can be checked against the real file.

The decision being made: what file, what columns, what cadence, what locale, what delivery channel.

Goals

A single Excel file per shift / per day that Ronald can drop into his workflow without re-entry
es-CO locale throughout: dd/mm/yyyy dates, comma decimal separator, period thousand separator
Deterministic file naming and a stable storage location on the Jetson + cloud
Reproducible from the SQLite source-of-truth — re-running the exporter for a past day yields the same output
Both human-readable formatting (formulas, banding, freeze panes) and a parallel .csv mirror for any downstream automation

Non-goals

Replacing Ronald’s monthly summary roll-ups (lost-time by code, presence %, monthly efficiency curves). Phase I exports the daily indicador rows; the monthly summary is computed by Ronald’s own spreadsheet — we feed its inputs.
Live charts inside the Excel file. The dashboard provides live; the Excel is a record.
PDF export. Out of scope; if needed, generate via LibreOffice --convert-to pdf later.
Pushing data into Ronald’s existing spreadsheet file (no in-place mutation of an .xlsx he owns).

Proposed approach

File format and library

.xlsx (Excel 2007+) — what Ronald’s current workflow uses.
openpyxl to write — already in the v1.9 stack. Pandas only for the SQL-pull layer; we hand the DataFrame to openpyxl ourselves so we can control formatting (we cannot get banding, freeze-panes, and locale-aware number formats from df.to_excel() alone).
.csv mirror written alongside (UTF-8 with BOM, semicolon separator — see locale below) so downstream automation can ingest without an xlsx parser.

Filename and storage location

/data/exports/
  indicadores-<module>-<YYYY-MM-DD>.xlsx        # daily, per module
  indicadores-<module>-<YYYY-MM-DD>.csv         # csv mirror
  indicadores-<module>-<YYYY-MM>-monthly.xlsx   # monthly summary (Phase II)

Example: indicadores-ANGELA-2026-07-15.xlsx.

Mirrored to Google Drive at LBZF Computer Vision Project / Exports / <YYYY-MM>/ once the AWS / Drive integration is live (Phase II), and to S3 at s3://lbzfai-exports/<YYYY-MM>/ for retention.

Cadence — when does the exporter run

Decision: one file per (module, local-date), generated at end-of-shift plus on-demand from the dashboard.

Cadence	Why
End of shift (automatic)	Mirrors Ronald’s current “end of day, do the Excel” routine. Triggered by a systemd timer that runs at 18:00 CO time on weekdays.
On-demand from dashboard	If the day’s data needs a re-run (a manual operator correction, a model re-deploy), Ronald or Sophia can re-export.
Per-week summary	Phase I non-goal. The day files concatenate trivially in Excel.
Per-shift (if LBZF runs two)	Re-run with `--shift=1` and `--shift=2`. OPEN: confirm shift count with Ronald.

Delivery channel — where does the file land

Phase I:

Written to /data/exports/ on the Jetson (canonical).
Emailed to Ronald (ingenierialbarton@gmail.com) and Mariana, with the file attached. SMTP via SendGrid (1k/day free tier) or AWS SES once the account is live.
Downloadable from the dashboard (GET /api/exports/<filename>, see dashboard-api.md).

Phase II adds Google Drive upload to a Ronald-owned folder.

Email is the Phase I primary delivery because (a) Ronald’s existing inbox is the workflow he checks daily, (b) Drive integration requires OAuth into Ronald’s account and that’s not on the May 15 critical path.

OPEN: SMTP credentials and sender domain — owner: Sophia / Andrew, blocker for: end-to-end email delivery. Recommended sender: reports@lbzfai.com (configure SPF/DKIM on the lbzfai.com Cloudflare zone).

Locale (es-CO) — applied everywhere, not just the export

Dates: dd/mm/yyyy (e.g., 15/07/2026)
Datetimes (where shown): dd/mm/yyyy HH:MM (24-hour)
Decimal separator: , (comma)
Thousand separator: . (period)
Percentage: 45,7% (not 45.7%)
Excel number format strings: "#.##0,00" for numbers, "0,0%" for percentages, "dd/mm/yyyy" for dates
CSV: semicolon delimiter (not comma) because the comma is the decimal separator in es-CO. UTF-8 with BOM so Excel opens it correctly.
Negative numbers in efficiency columns rendered red via conditional formatting (Ronald’s spreadsheet does this).

Sheet layout (assumed — confirm against actual file)

All structural details below are assumptions based on what a typical Latin-American industrial-engineer’s daily indicadores sheet contains. They WILL need correction once the real INDICADORES ABRIL.xlsx is parsed. Flag the diff for Ronald.

Sheet 1: Indicadores (per-cycle detail, one row per cycle_event)

Col	Header (Spanish)	Source	Format	Notes
A	`fecha`	`cycle_events.start_ts` → CO local date	`dd/mm/yyyy`
B	`hora_inicio`	`cycle_events.start_ts` → CO local time	`HH:MM:SS`
C	`hora_fin`	`cycle_events.end_ts` → CO local time	`HH:MM:SS`	blank if in_progress
D	`módulo`	`modules.display_name`	text
E	`puesto`	`workstations.display_name`	text
F	`operación`	`workstations.operation`	text	matches Spanish vocab in `operations.md`
G	`operario`	`operators.display_name`	text	blank in Phase I if operator capture deferred
H	`OP`	`cycle_events.ref_op_order`	text	optional in Phase I
I	`SAM (min)`	`standard_times.sam_minutes` joined on `operation`	`#.##0,00`
J	`tiempo std (s)`	`standard_times.sec_per_cycle`	`#.##0,0`	derived from SAM × 60
K	`tiempo real (s)`	`cycle_events.duration_seconds`	`#.##0,0`
L	`eficiencia`	`tiempo_std / tiempo_real`	`0,0%`	conditional format: <80% red, 80–95% yellow, >95% green
M	`piezas`	always 1 in Phase I	`0`	one cycle = one piece; aggregated on Sheet 2
N	`máquina`	`standard_times.machine_code`	text	MPP / MREV / MOJA / MBOT / PLCH
O	`confianza CV`	`cycle_events.confidence`	`0,00`	so Ronald can see when CV was uncertain
P	`modelo CV`	`cv_model_versions.name + version`	text
Q	`observaciones`	`cycle_events.notes`	text	manual override reason, etc.

Header row frozen (Window > Freeze top row). Autofilter on row 1.

Sheet 2: Resumen Módulo (per-day per-module summary, mirrors INDICADORES ABRIL.xlsx’s monthly headers)

Col	Header	Source / Formula
A	`fecha`	local date
B	`módulo`
C	`jornada (min)`	from `shifts.end_ts - shifts.start_ts`
D	`#operarios`	distinct `cycle_events.operator_id` for the day OR seeded `modules.operator_count` (Phase I default 22 for ANGELA)
E	`OP`	distinct `cycle_events.ref_op_order` for the day
F	`SAM/garment (min)`	`standard_times` total for the reference (24,16 for Ref22)
G	`unidades`	distinct completed cycles at the last station in the sequence (i.e., `DOBLAR Y PRESENTAR CAMISA ML`)
H	`total minutos productivos`	sum(`cycle_events.duration_seconds`) / 60
I	`eficiencia módulo`	`(unidades × SAM/garment) / (jornada × #operarios)`
J	`rendimiento módulo`	`unidades / capacidad_objetivo_horaria * 100`

Column names and formulas need to be diffed against INDICADORES ABRIL.xlsx once it’s parseable. The columns above are the v1.9 doc’s stated list — see docs/factory/current-state.md.

Sheet 3: Tiempos Estándar (read-only mirror of standard_times)

So the file is self-contained for an auditor — they don’t need a second spreadsheet to interpret it.

Col	Header	Source
A	`referencia`	`standard_times.reference`
B	`operación`	`standard_times.operation`
C	`SAM (min)`	`standard_times.sam_minutes`
D	`máquina`	`standard_times.machine_code`
E	`fuente`	`standard_times.source` (default `tiempo std manual`)

Sheet 4 (Phase II placeholder): Tiempo Perdido

For the 22-code C001..C022 taxonomy. Blank in Phase I.

Implementation sketch

from datetime import date
from zoneinfo import ZoneInfo
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import PatternFill, Font, NamedStyle
from openpyxl.utils import get_column_letter

CO = ZoneInfo("America/Bogota")

def export_day(db_path: str, module_code: str, target_date: date, out_dir: str) -> str:
    rows = _query_cycles(db_path, module_code, target_date)        # Sheet 1
    summary = _summarize(rows, module_code, target_date)            # Sheet 2
    std_times = _query_std_times(db_path, module_code)              # Sheet 3

    wb = Workbook()
    _write_indicadores(wb.active, rows)
    _write_resumen(wb.create_sheet("Resumen Módulo"), summary)
    _write_std_times(wb.create_sheet("Tiempos Estándar"), std_times)
    wb.create_sheet("Tiempo Perdido")                                # Phase II placeholder

    out = f"{out_dir}/indicadores-{module_code}-{target_date.isoformat()}.xlsx"
    wb.save(out)
    _write_csv_mirror(rows, out.replace(".xlsx", ".csv"))
    return out

Excel-write functions own the number-format strings, the conditional-format rules, the freeze pane, and the column widths. pandas.DataFrame.to_excel() is not used as the final writer — we want full control.

Determinism

Re-running export_day(db, "ANGELA", 2026-07-15) must produce byte-identical output if the underlying data is unchanged. This means:

Stable row ordering: ORDER BY workstation_id, start_ts
No datetime.now() in cell values (only in a “generated_at” footer cell, which is fine)
openpyxl’s Workbook.save() is deterministic except for the modified-at metadata; we strip / pin it via wb.properties.modified = wb.properties.created = datetime(2026,1,1) so file hashes match across re-runs

This matters for the paper: reviewers ask for the exact export file used in figures.

Sample row (illustrative)

fecha       hora_inicio  hora_fin   módulo  puesto      operación          operario      OP           SAM (min)  tiempo std (s)  tiempo real (s)  eficiencia  piezas  máquina  confianza CV  modelo CV          observaciones
15/07/2026  07:14:22     07:15:08   Angela  Puesto 1    PESPUNTAR CUELLO   Operario A    OP-2026-0712  0,38       22,8            46,1             49,5%       1       MPP      0,87          yolov8n-coco/8.1.30
15/07/2026  07:15:35     07:16:09   Angela  Perrillas   PRENDER PERILLA    Operario B    OP-2026-0712  0,44       26,4            34,2             77,2%       1       MPP      0,91          yolov8n-coco/8.1.30

Note the comma decimal separator and dd/mm/yyyy. This is what Ronald sees.

Confidentiality marker

The footer of each sheet writes Confidencial — Uso Interno per the LBZF Needs Definition. Sheet headers do not embed operator names in the file path (so screenshots of the filename in a project status report don’t leak names).

Alternatives considered

CSV-only. Cheap to write, hostile to Ronald’s existing workflow. Rejected.
Google Sheets API direct writes (skip the .xlsx file). Couples us to Ronald’s Drive auth on day one; defers cleanly to Phase II. Rejected for Phase I.
Live Excel “linked” file (Excel opens and refreshes from a SQL connection) — fragile, requires Excel + ODBC on Ronald’s machine. Rejected.
pandas.to_excel() only — no formatting control, no freeze pane, no conditional format. The 30-line openpyxl wrapper buys Ronald the visual parity. Rejected.
Per-cycle PDF receipts — interesting for IEEE paper figures, not for Ronald’s workflow. Defer.

Open questions

OPEN: actual column set in INDICADORES ABRIL.xlsx — owner: Ronald / Armando, blocker for: every column header above being right. Action: pull a copy of the April file into the repo as docs/references/sample-indicadores.xlsx (gitignored if it contains operator names; committed as anonymized CSV mirror).
OPEN: shift count — owner: Ronald. Phase I assumes single shift, M-F.
OPEN: email sender and SMTP credentials — owner: Andrew / Sophia.
OPEN: who receives the email automatically every day? Ronald + Mariana is the safe default; add Sophia / Andrew as BCC? Owner: Mariana.
OPEN: how should the exporter behave if a cycle is still in_progress at export time — drop it, include it with end_ts blank, or wait? Default: include with end_ts blank and an “(en curso)” observation. Owner: Ronald.
OPEN: monthly summary — Phase II per scope, but Ronald may ask immediately. Defer.
OPEN: locale of weekday names (e.g., “lunes 15/07/2026”) — Spanish, but do we need it? Ronald’s existing sheet doesn’t, per the v1.9 doc.

Cross-bucket dependencies

This doc depends on	Owner bucket	What we need
Standard-times seed	This bucket (`data-model.md`)	Without `standard_times` populated, columns I/J are blank
Operator capture decision	This bucket + Agent E	Column G blank or not
Email sending infra	This bucket (`deployment-and-cicd.md`)	SES / SendGrid / SMTP credentials
Drive integration (Phase II)	Frontend or new sub-bucket	OAuth into Ronald’s account

This doc implies	Owner bucket	Ask
Auth0 user identity flows into `audit_log` when Ronald downloads an export	Frontend (Agent C) + this bucket	Pass identity through
ML/CV writer populates `cycle_events.confidence`, `detection_count`, `cv_model_version_id`	ML (Agent D)	Otherwise columns O/P are NULL

What’s weak in this doc

The single biggest assumption is the column set in INDICADORES ABRIL.xlsx — we cannot guarantee drop-in until we open the file. If even one column header differs, Ronald has to copy-paste between sheets and we lose his adoption.
No locale-correctness test in CI. A late-introduced bug (e.g., openpyxl version that defaults to US format) would silently break Ronald’s workflow. Need a golden-file test: fixture DB → exporter → byte-diff against a checked-in golden xlsx.
Email delivery is the brittlest part of the pipeline. SMTP from the Jetson is fragile (Colombia ISPs often block port 25); we should use SES/SendGrid HTTP API, but that means an outbound network dependency, which means a Tailscale-vs-public-internet question we haven’t answered.
No charts on Sheet 2. Ronald’s existing sheet may rely on hand-drawn charts; if so we’ll need to add an openpyxl chart object or accept a degraded version-1.
Deterministic / paper-reproducible export is asserted but not tested. Even with metadata pinning, openpyxl can drift across minor versions and break byte-equality.
“Resumen Módulo” formulas are educated guesses. The eficiencia formula in particular needs to be verified against Ronald’s actual computation — there are several reasonable formulas (unidades × SAM / (jornada × operarios), vs. total_minutos_productivos / (jornada × operarios)) and the wrong choice will produce numbers that disagree with his existing reports.

Rollout

Now (May 2026): stub exporter that runs against the seed DB and produces a one-row demo file with hard-coded data, to validate locale formatting visually.
Before Argentina (2026-05-15): show ITBA the demo .xlsx so they can mirror format in their own twin.
Before Pereira (Jul 2026): real exporter against a 1-week run on the dev Jetson; diff against an actual INDICADORES ABRIL.xlsx page that Armando obtains; iterate until Ronald says “this is the file I would have made.”
Day-1 of Pereira deploy: end-of-shift cron + email sending live; Sophia BCC’s herself for the first week.
Phase II: Drive upload; monthly roll-up; lost-time sheet populated by automated detectors.