BAD AI COMPANY — Online · Experiments running · New projects open
BadAI
← The Lab
Experiment·2026-01-14·7 min read

Experiment 001: We automated a whole job function. Here's what happened.

We replaced a reporting analyst's complete weekly workflow with three AI agents. Build time: eight weeks. Weekly hours: from 14 to 2. One significant incident in 11 months of operation. The job still exists. Here is exactly what we did, what the human does now, what we got wrong, and the edge case that nearly broke everything. No spin.

The function we automated

A reporting analyst at a mid-sized logistics company spent roughly 14 hours per week: pulling data from six internal systems, normalising formats, identifying anomalies, writing narrative summaries, and distributing reports to eight stakeholders. Well-defined, repetitive, data-rich. The analyst was good at it and found it genuinely tedious. They asked us to automate it. We did.

The agent architecture

Three agents. The Collector pulls from all six source systems on schedule, handling authentication, format differences, and transient failures without human intervention. The Processor normalises the data, validates it against expected ranges, and identifies anomalies using configurable thresholds. The Reporter generates narrative summaries in a consistent voice, formats them for each stakeholder's preferences, and distributes via email and Slack. Agents communicate through a lightweight message queue. Build time: eight weeks. Hardening: three additional weeks.

14→2h
Weekly hours
8 wks
Build time
6
Source systems
3
Agents

What the human does now

The analyst reviews automated output for approximately two hours per week. They handle exceptions: cases the anomaly detector flagged but could not resolve, stakeholder requests that fell outside the standard report format, and quarterly reviews that require business context the agents don't have. They own the configuration: adjusting anomaly thresholds as the business changes, updating report templates when requirements shift. The work is different. The role is not gone.

What we got wrong

Three things. First: one source system — a legacy ERP — outputs dates in four different formats depending on which module you access. We didn't discover this until week three. The Collector needed additional handling for each format variant. Second: our initial anomaly thresholds were too sensitive. The Processor flagged 40% of records as anomalous in week one. We recalibrated over two weeks. Third: stakeholders were uncomfortable receiving reports that didn't identify themselves as AI-generated. Adding a clear attribution header and a reply-to address resolved the trust issue entirely.

The edge case

Silent failure — three days of blank reports

In week six, a source system was migrated by the vendor with no advance notice. The Collector began pulling empty datasets. The Processor didn't flag it — zero rows technically passed validation. The Reporter generated and distributed blank reports for three days before anyone noticed. Fix: we added a minimum data volume check and an explicit alert if any source falls below 50% of its expected row count. This check has fired twice since and prevented both from becoming user-visible incidents.

The honest numbers

Twelve hours per week recovered. Zero missed reports since hardening. One significant incident in 11 months of operation. ROI positive at month four. The analyst's assessment: 'I don't miss the data pulling. I do occasionally miss having a simple answer to what I actually did this week.' That last sentence is worth sitting with. Automation changes what it means to have contributed. That is worth designing for explicitly — not as an afterthought.

Guidance

When this model works — and when it doesn't

This works on well-defined, repetitive, data-rich workflows with stable inputs and measurable outputs. It does not eliminate the role — it transforms it toward exception handling, configuration, and judgment. If the workflow involves significant human judgment at every step, frequently changing requirements, or unclear success criteria, the economics are different. Ask us to scope it honestly before you commit.