Reproducibility & artifacts
Bucket: technical/ml (Agent D) · Status: draft · Owner: Sophia Mann · Phase: I (paper-grade) · Last updated: 2026-05-10
Context
Section titled “Context”For the IEEE submission (CASE / Access / T-ASE / IROS targets), reproducibility is increasingly an explicit reviewer criterion. IEEE Access in particular asks authors to make data and code available “to the extent permitted by the institution / data provider.” A reviewer who can re-run the analysis script against the released dataset and see the paper’s numbers come out is a paper that gets accepted; one who cannot is a paper that gets rejected at R2.
LBZF data is classified Confidencial — Uso Interno (per the Needs Definition). Operator identifiers are linked to named individuals via Ref22 Slim - Angela.xlsx and INDICADORES ABRIL.xlsx. So the reproducibility plan cannot be “publish everything.” It must be: publish enough that the methodology is reproducible without compromising the operators or LBZF.
This doc specifies the artifact set, the open-source / data-release posture, and the storage / version discipline.
- G1: Every figure/table in the paper can be regenerated from a small, named set of artifacts that lives in a known location.
- G2: A third-party reviewer can run an evaluation script that exercises at least the model-side reproducibility, against a public sample.
- G3: Operator identifiability is removable at the artifact level (face-blurring is a step in the pipeline, not a manual figure edit).
- G4: LBZF retains control over its own footage; nothing exits without Mariana’s sign-off.
Non-goals
Section titled “Non-goals”- Long-term archive of all 4TB rolling video (this is operational, not artifact).
- Public release of operator-identifiable production data.
- Open-sourcing the dashboard/landing-page web stack — Agent A’s call.
- Hosting infrastructure for downloadable artifacts beyond what’s already in scope (S3, GitHub, Drive).
Artifact set
Section titled “Artifact set”The full set a paper-reviewer should be able to access (modulo LBZF approval per category):
| Artifact | Format | Public? | Location |
|---|---|---|---|
| Code: detector + state machine | Python package, lbzf-cv/ | YES | GitHub sophiamann/lbzfai-cv (separate repo from the landing page) |
| Code: calibration tool | Python + JS in same repo | YES | same |
| Code: training pipeline | Same repo, training/ subdir | YES | same |
Code: validation analysis script (validate_v1.py) | Same repo | YES | same |
| Model: yolov8n_lbzf_v0.pt (fine-tuned, if Stage B runs) | PyTorch checkpoint | MAYBE — needs LBZF approval | If yes: GitHub LFS or HuggingFace Hub |
| Model: yolov8n.engine (TensorRT) | TensorRT engine | NO — non-portable; rebuild on user’s Jetson | (doc only) |
| Dataset: full Phase I training set | COCO-format JSON + frames | NO (operator-identifiable) | LBZF-owned S3 |
| Dataset: 100-frame “public sample” | COCO-format + frames, face-blurred + manual review | MAYBE — LBZF sign-off | HuggingFace Datasets or Zenodo |
Dataset: split CSV (dataset_split_v0.csv) | CSV, frame-id only (no images) | YES | GitHub |
Validation: raw cycle_events.csv (anonymized) | CSV | MAYBE — aggregated only | OSF or Zenodo |
Validation: per-shift manual_observations.csv (Ronald stopwatch) | CSV | MAYBE — aggregated only | OSF or Zenodo |
| Validation: paired Bland-Altman input data | CSV | YES (aggregated, no operator IDs) | OSF or Zenodo |
| Telemetry: health-score timeseries for validation window | CSV | YES | same |
| Paper figures source files (notebooks / matplotlib scripts) | .ipynb / .py | YES | same |
| Design docs (this directory) | Markdown | YES | this repo |
The artifact set is intentionally conservative on raw data and liberal on code and aggregates. A reviewer can:
- Read every design decision (markdown in this repo).
- Re-run the model architecture and training pipeline (code).
- Re-run the analysis pipeline against the released aggregate-level CSVs to confirm the paper’s statistical conclusions (analysis script).
- Run the model on their own footage (a public-sample dataset with LBZF sign-off, optionally).
What a reviewer cannot do:
- Re-train the model on the full LBZF dataset (data is LBZF-owned).
- Audit operator-level performance (deliberately anonymized).
Versioning discipline
Section titled “Versioning discipline”Model versions
Section titled “Model versions”| Field | Convention | Example |
|---|---|---|
model_name | yolov8n_lbzf_vN (N = 0, 1, …) or yolov8n_coco_pretrained (off-shelf baseline) | yolov8n_lbzf_v0 |
model_sha256 | SHA-256 of the .pt file | a3b9... |
trained_on | dataset hash, see below | lbzf_person_v0 |
trained_at | UTC timestamp of training run | 2026-06-12T14:33:00Z |
git_sha | commit SHA of the training repo at training time | f4e5... |
framework_version | ultralytics==8.x.y, torch==2.z.w | (pinned in requirements.txt) |
tensorrt_engine_jetpack | JetPack version the .engine was built against (TRT engines are non-portable) | 6.0-r36.3 |
Dataset versions
Section titled “Dataset versions”Each dataset version is a directory with:
images/(frames, anonymized if released)annotations.coco.jsondataset_split_vN.csv— frame_id, split (train/val/test), source_video, recorded_atmanifest.json— hash of every image + annotationLICENSE— for the public sample; the private full set has an LBZF data-use agreement (Agent E)
Hash the manifest. dataset_id = sha256(manifest.json). This is what model.trained_on references.
Cycle-event log versions
Section titled “Cycle-event log versions”Every row in cycle_events carries model_version, roi_version (per roi-calibration.md), and git_sha of the inference code. So any historical analysis can name the exact configuration that produced any given event.
Storage tiers and access control
Section titled “Storage tiers and access control”| Tier | What | Where | Who |
|---|---|---|---|
| Tier 0 — public | Code, design docs, dataset split CSV, aggregate CSVs, public-sample dataset (if approved), figures source. | GitHub sophiamann/lbzfai-cv; HuggingFace Hub for model + sample dataset (if approved); Zenodo / OSF for aggregate CSVs (durable DOIs for paper citation). | Anyone |
| Tier 1 — restricted | Full training dataset, full validation manual_observations, raw video clips used for figures. | LBZF-owned S3 bucket (or LBZF Google Drive). Access via signed-URL for reviewers under NDA, or via Mariana-approved researcher list. | LBZF + named collaborators (Sophia, Andrew, ITBA, designated reviewers) |
| Tier 2 — private | 4TB rolling video buffer; per-frame logs; operator-identity coding keys; consent forms. | Jetson NVMe (locally); not exported. | LBZF only |
GitHub LFS for model checkpoints (Tier 0). Avoid putting large files in the main repo. The training-pipeline repo is sophiamann/lbzfai-cv (or whatever Sophia + Andrew settle on; the landing-page repo is intentionally separate per project hygiene).
Open-source posture (the LBZF coordination question)
Section titled “Open-source posture (the LBZF coordination question)”Coordinate with Agent E. This doc proposes the following default, to be confirmed with Mariana before any public release:
- Code (Tier 0): permissive license (MIT or Apache 2.0). No business reason to restrict the CV pipeline code; it’s a reference implementation of a small-factory deployment. Andrew’s “open-source ambition” framing (
docs/overview/project.md) supports this. - Public-sample dataset (~100 frames, face-blurred): CC-BY-4.0 or CDLA-Permissive-2.0 with an explicit “this dataset is for research only; do not attempt to re-identify operators” clause. Requires Mariana sign-off.
- Full dataset: NOT released. LBZF-owned. A reviewer-NDA path exists but is gated on Mariana.
- Fine-tuned model checkpoint: released IF Stage B fine-tuning happens AND the model does not memorize identifiable operators (verify via membership inference attack on a sample subset before release). Default conservative: not released without check.
Reviewer reproduction path
Section titled “Reviewer reproduction path”A reviewer who wants to reproduce the paper’s claims should be able to:
# 1. Clone the repogit clone https://github.com/sophiamann/lbzfai-cvcd lbzfai-cvgit checkout paper-v1 # tagged at paper submission
# 2. Set up envpip install -r requirements.txt # pinned versions
# 3. Pull the public sample dataset (if released)make data-sample # downloads from HuggingFace
# 4. Re-run the validation analysis against aggregate CSVspython validate_v1.py --inputs data/aggregate/ --output results/
# 5. Compare to the paper's resultsdiff results/ paper_results/ # files should matchThis is the test: if diff is empty, the paper’s numbers are reproducible from the released artifacts. Any divergence is a paper bug.
Alternatives considered
Section titled “Alternatives considered”| Alt | Why rejected |
|---|---|
| Release everything publicly | Violates LBZF confidentiality; jeopardizes operator consent; legally fraught under Colombian Law 1581/2012. |
| Release nothing publicly | Disqualifies the paper from competitive IEEE venues; defeats the project’s reference-implementation framing. |
| Release only code, no data | Acceptable but weaker — reviewers can’t independently verify any data-dependent claim. The aggregate-CSV-and-public-sample middle path is better. |
| Use Roboflow Public for the dataset | Same Roboflow-public-license issue as training-and-finetuning.md §B.1 — incompatible with LBZF confidentiality without re-licensing. |
| HuggingFace gated dataset (request access) | Reasonable for the full dataset if LBZF allows. Default for now: full dataset not released even gated; revisit. |
| Self-host artifact storage on the lbzfai.com Cloudflare worker | Cute but Cloudflare R2/Workers are not durable-DOI venues. Zenodo / OSF / HuggingFace are the right homes for paper-cited artifacts. |
Open questions
Section titled “Open questions”- OPEN[Mariana via Agent E, by 2026-08-01]: Approve the open-source posture. Specifically: (a) public release of code, (b) public release of a face-blurred 100-frame sample, (c) NDA path for the full dataset for reviewers.
- OPEN[Andrew, by 2026-06-01]: License choice for the code — MIT vs Apache 2.0. (Apache 2.0 has the patent grant which is mildly relevant if this is ever commercialized by LBZF.)
- OPEN[Sophia, by 2026-06-15]: Reserve a Zenodo DOI for the aggregate-data archive before paper submission so the paper can cite a durable URL.
- OPEN[ITBA, by 2026-06-15]: ITBA’s twin install may add its own dataset and metrics. Coordinate whether ITBA’s data is in the same Zenodo deposit or separate. Co-authorship implications.
- OPEN[Sophia, by 2026-09-01]: Membership-inference attack on the fine-tuned model — does the model leak operator identity? If yes, do not release the checkpoint.
- OPEN[Agent C]: TensorRT engine versioning is JetPack-pinned; what’s the right “minimum-supported JetPack version” claim for the paper?
Cross-bucket dependencies
Section titled “Cross-bucket dependencies”- Agent A (frontend): footer of the dashboard surfaces
model_version+roi_version+ commit SHA so any user can ask “which version produced this number.” Five-line UI change but essential for reproducibility. - Agent B (backend):
cycle_events.model_version,cycle_events.roi_version,cycle_events.code_shacolumns. Already requested incycle-event-detection.mdandroi-calibration.md. - Agent C (hardware): TensorRT engine is non-portable; the artifact set documents how to rebuild it on a Jetson but does not include the binary. Confirm with Agent C the rebuild recipe is robust enough that a reviewer can follow it.
- Agent E (business/legal): LBZF approval; dataset license drafting; reviewer-NDA template.
What’s weak in this doc
Section titled “What’s weak in this doc”- The “public sample” is conditional on Mariana’s approval and the approval has not been requested. Without it, the paper falls back to “code-only reproducibility” which is weaker. The ask has to be teed up well before paper submission; the conversation with Mariana should happen before the Pereira trip, not after.
- No formal data-management plan (DMP). Funder DMPs (NSF, EU) are not strictly required here (no funder), but IEEE Access reviewers increasingly look for one. A 1-page DMP would close the gap.
- Membership-inference checks on the fine-tuned model are listed as a release gate but no procedure is specified. Real procedure: hold out 50 known-training and 50 known-not-training operator frames, run a shadow-model-based inference attack; if AUC > 0.6, do not release. This needs to be a step in the training pipeline.
- No story for retracting an artifact once released. If LBZF revokes consent or an operator withdraws after publication, what’s the takedown procedure? HuggingFace and Zenodo both support takedowns but the process is manual; not specified.
- GitHub LFS for model checkpoints has a free-tier 1 GB storage / 1 GB-month bandwidth cap. YOLOv8n’s
.ptis ~6 MB — fine — but if Phase II’s bigger models go in the same repo, the cap bites. Plan to migrate model artifacts to HuggingFace Hub from day 1 to dodge the issue. - Reproducibility-via-
diffassumes deterministic analysis. Bootstrap CIs depend on RNG seed; the analysis script must pin seeds and the validation doc must say which seed. A reviewer running with a different seed sees ε-level disagreement and may flag it. Specify the canonical seed.
Rollout
Section titled “Rollout”| Date | Gate |
|---|---|
| 2026-05-25 | sophiamann/lbzfai-cv repo created with skeleton + LICENSE. |
| 2026-06-01 | model_version / roi_version columns in DB; inference code logs both per cycle. |
| 2026-06-15 | First version of validate_v1.py running against synthetic data. |
| 2026-07-15 | First version of validate_v1.py running against real Pereira data. |
| 2026-08-01 | Mariana approval ask for the public-sample dataset; Zenodo DOI reserved. |
| 2026-09-01 | Paper-v1 tag in repo; artifact set frozen. |
| 2026-09-15 | Paper submitted. |
Paper alignment
Section titled “Paper alignment”- Data availability section of the paper points to the Tier-0 artifacts and to the reviewer-NDA path for Tier-1.
- Reproducibility checklist (some venues require it; if so, this doc is the checklist).
- Methods cite the model_version / roi_version / dataset_id triple — makes the experimental setup unambiguous.
- A footnote: “Code at
https://github.com/sophiamann/lbzfai-cv, taggedpaper-v1.” - An acknowledgement: “Data made available under the terms of an institutional research agreement with Louis Barton Zona Franca SA.”