A static generator is a system that takes a prompt and produces an output, and then forgets. A closed loop is a system that takes a prompt, produces an output, watches what happens to the output in the world, and uses that signal to produce better outputs next time. The two systems look identical at the surface. They behave differently on a timescale of months. One degrades. The other improves.

Almost every AI project we have seen ships as a static generator. Almost every AI project we have rebuilt has the closed loop bolted on later, at significantly higher cost than if it had been built in. This is the architectural piece most teams get wrong, and the reason most AI systems are flat after twelve months instead of compounding.

What a static generator looks like

The default architecture for an enterprise AI feature: a prompt goes in, the model returns a response, the response is shown to the user. The user does something with the response — accepts it, edits it, ignores it, copies a sentence and pastes it elsewhere — and the system records none of this. The next time a similar prompt comes in, the same output (give or take temperature) is produced.

This is a static generator. The model's capability is fixed at the day of deployment. The prompt's quality is fixed at the day someone last edited it. The system's understanding of what works is also fixed — usually at zero, because nobody set up the channel to find out.

A static generator can be very good at launch. It can also be very good a year later, in absolute terms — the underlying model hasn't gotten worse. What it can't be is better. The world moved on, the use cases shifted, the users developed habits the system hasn't observed. The output drifts from what would be useful, and nobody can quite name why.

What a closed loop looks like

The architecture is not complicated. Three additions to the static generator make it a loop.

1 — Outcome capture. Every output gets followed by an observable outcome. The user accepted the draft as-is. The user rewrote 30% of it. The user discarded it entirely. The user accepted it but the downstream click-through rate was below benchmark. These outcomes are written to a structured log keyed to the original prompt + output pair.

2 — Signal aggregation. The outcome log is aggregated on a regular cadence — daily or weekly — into patterns. "Outputs for segment X are being rewritten 60% of the time. Outputs for segment Y are accepted at 95%." The aggregation is the diagnostic.

3 — Generation update. The patterns feed back into how the system generates next time. The mechanism varies: it could be a prompt update, a finetune, a retrieval-augmented swap to a better example, a router rule, or a model selection change. The key is that the update is structured — driven by the signal, not by a designer's intuition.

The system now compounds. Each month's outputs are better than the previous month's because the system has seen what worked and what didn't, and the generation has shifted accordingly.

A static generator can be very good at launch. What it can't be is better.

Why most teams skip the loop

Three reasons, ordered by how often we see them.

The outcome is invisible. The first reason is the most boring and the most common. The team can't measure outcomes because the outcome signal is not in the system. The draft gets sent to email. The email gets read or not. The read doesn't get reported back. The team only sees the click-through rate at the campaign level, not at the per-draft level. The signal is there but it isn't connected.

Fix: instrument the boundary. Every output gets a unique ID. Every downstream event keys to that ID. The infrastructure work is small — a couple of weeks for most teams — and it is the highest-leverage work the team can do.

The aggregation pipeline doesn't exist. The second reason is also infrastructure. Even when the events get captured, there is no system that turns them into actionable signal. The logs sit in a warehouse and nobody queries them. The team's "analytics" is a once-a-quarter slide deck instead of a continuous diagnostic.

Fix: a small daily job that produces a one-page summary — per-segment acceptance rates, anomaly flags, week-over-week deltas. Sent to the team channel. Read in standup. The pipeline doesn't need to be sophisticated. It needs to exist.

The update mechanism is informal. The third reason is more cultural. Even when the signal exists and is read, the team's response is ad-hoc. Someone notices a drop in segment X and writes a slack message. A few days later, someone updates the prompt. The change is not measured. The loop is half-closed.

Fix: turn the update into a regular ritual. Weekly review of the signal, formal experiments queued against the eval harness, A/B comparisons run when the cost justifies. The team starts shipping prompt and model updates the same way the engineering team ships code changes — with a record, a reason, and a measurable result.

The architectural shape, in concrete terms

For the creative automation system we built for a leading Indian fashion marketplace, the loop looks like this.

Generation — a brief comes in (segment, channel, brand input). The system produces 100+ variants of a campaign, tagged with the generation parameters used.

Selection — the client's CRM picks which variants to send to which sub-segments, also tagged.

Capture — every send produces a row in the outcome log: variant ID, segment, channel, open rate, click rate, conversion rate, return rate.

Aggregation — weekly, the outcome log is rolled up into per-variant-class performance. "Editorial-toned variants for segment 24F-metro-premium are converting at 1.4× the baseline. Promotional-toned variants for the same segment are at 0.7×."

Update — the next generation cycle pulls from this signal. The few-shot examples in the prompt for that segment-class are shifted toward the patterns that performed. Underperforming variant classes are deprioritised in the generation distribution.

Six months in, the system's per-variant conversion is materially higher than at launch. The model didn't change. The wiring did.

Where the loop breaks down

Two failure modes worth naming.

Reward hacking. If the loop optimises naively for the easiest outcome signal — say, click-through rate — the system can converge on a degenerate strategy that looks good on the metric and is bad for the business. Clickbait headlines work for CTR and not for brand. The mitigation is to feed multiple outcome signals into the loop and to gate the update on a composite score that includes the slower signals (return rate, complaint volume, brand quality).

Feedback narrowness. If the system only generates within a narrow band that matches past success, it stops exploring and gets stuck on a local optimum. The mitigation is to allocate a fixed fraction (say, 10%) of generation to deliberate exploration — new tones, new compositions, new framings — and to measure their outcomes alongside the proven ones. The system retains the ability to find better solutions over time.

When a static generator is fine

One last thing. There are use cases where a static generator is the right call. Low-volume work where the per-output cost of being slightly suboptimal is small. One-shot tasks where there is no meaningful outcome signal to feed back. Compliance-bound work where any model behaviour drift is itself a risk.

For the rest — high-volume content, recommendation, drafting, classification, routing, summarisation across long horizons — the loop is the architecture. Building it second is more expensive than building it in.

Closing

The AI systems that compound are the ones where every output is also an input. The AI systems that decay are the ones where every output is a dead end. The structural difference is in two or three pieces of infrastructure that most teams treat as nice-to-haves. The studios that treat them as load-bearing are the ones whose AI systems are better at twelve months than they were at one.

Build the loop. The output gets better while you sleep.