How to measure design: a working framework

Design without measurement is decoration. That sentence will offend most designers. It is not meant to. It is meant to push past the false choice that has stuck for two decades — between craft and accountability. The good studios do both. They ship work that is beautifully made and they can tell you, in business terms, why it works. The studios that can't make the second case usually lose the first one too. The board cuts the budget; the next brief goes to someone cheaper; the craft is gone before the spreadsheet remembers it ever existed.

What follows is the framework we use to measure design at Two Words. It is the one we wish we had been handed early. It tries to take the question seriously without flattening it into a single number. It treats design as a system that does work — and asks, like any other system, whether the work is getting done.

Start with the business question, not the metric

Most measurement programs start the wrong way. Someone reads about NPS or task success rate, then builds a dashboard, then waits for the design team to feel guilty when the number doesn't move. The dashboard is busy and the team is unhappy and nothing has been learned.

The better order is the opposite. Begin with the business question the project is actually trying to answer. Then choose the metric that would tell you, honestly, whether the answer is yes. The metric is downstream of the question. The dashboard is downstream of the metric. The dashboard is the last thing you build, not the first.

A wealth platform we worked on had a clear business question: can we onboard customers fast enough that they convert during the same session? The metric followed: time-to-first-confirmed-account. Everything we shipped — copy, layout, validation rules — could be argued against that single number. It also let us say no to changes that didn't move it.

The three altitudes of design measurement

Different design decisions sit at different altitudes. Trying to measure them all with the same tool is what produces the bad dashboards.

Product altitude — task and flow

At the lowest altitude, design decisions are about specific tasks. Can a user complete signup. Can they find a setting. Can they recover from an error. These are testable in the strict sense. The signals you care about are conversion rate, task completion rate, time-on-task, error rate, support ticket volume tagged to the flow.

Numbers move fast here. If you change the layout of an investor onboarding flow on Monday, you can see the conversion impact by Friday. This is the part of design measurement that most resembles a science.

System altitude — the design system itself

One altitude up, design decisions are about the system that produces the product. How fast can a new feature ship. How consistently does it look across three teams. How many duplicate components live in three places. Here the metrics are about velocity, consistency, and durability rather than task success.

The good ones to track: time from design brief to first production prototype, number of distinct primary buttons in the codebase, accessibility coverage across components, the ratio of design-team-resolved tickets to engineering-resolved ones for visual issues. These move on the order of months. They compound.

Brand altitude — the long horizon

At the top altitude, design decisions are about how the company is perceived. Brand strength, recall, willingness to pay, the price premium a recognisable mark commands. These move on the order of years. Asking a quarterly dashboard whether the brand work is succeeding is a category error. You measure these with longer-form instruments — tracking studies, share-of-voice analysis, willingness-to-pay surveys, recall tests against competitors.

A pharma launch we worked on lived at this altitude. The conversion metric was, in a sense, regulatory approval — but the design work was being judged by whether prescribers and patients understood the new therapy as the obvious option within three years. You don't put that on a Tuesday dashboard. You put it on the right one.

Quant alone is a trap

Numbers are easy to defend in a meeting. They are also easy to misread. A drop in completion rate could mean the design is worse — or it could mean a new fraud filter is rejecting more sign-ups, or that an upstream marketing campaign brought in a less qualified cohort, or that the analytics implementation broke on Tuesday and nobody noticed.

Qualitative signals are not a soft alternative. They are the thing that lets you correctly interpret the quant. The pattern that works:

Five user conversations per quarter, recorded, with notes synthesised into themes. The numbers tell you what happened. The conversations tell you why.
Support ticket tagging by flow and component. The volume tells you the magnitude of the problem. The verbatim text tells you what to fix.
Live observation of at least one full session per major flow per quarter. There is no substitute for watching someone struggle with a thing you made.

The numbers tell you what happened. The conversations tell you why.

What to ignore

Some metrics look like measurement and aren't. They tell you something — but the something they tell you isn't worth the meeting it takes to discuss it.

Page views, by themselves. They count motion, not progress.
Time on site, by itself. A confused user lingers longer than a confident one.
NPS as a single number. The breakdown matters more than the score. A 45 with a clear theme is more actionable than a 65 with no signal.
Likert-scale satisfaction surveys as the main feedback loop. They average out the disasters and the delights into beige.
Dribbble / award counts as a proxy for design quality. Useful for hiring; almost useless for product impact.

Talking about it in the boardroom

A design team gets cut when the business doesn't understand what it does. The same team gets a budget increase when the business understands and can argue for it. The difference is rarely the design — it is whether the leader can tell the story in the operator's language.

The translation table that works for us:

"We improved usability" → "We cut support tickets for this flow by 40% and reduced training time for new analysts by half a day."
"We refreshed the brand" → "Unaided brand recall in our segment moved from 14% to 22% in twelve months. Three RFPs cited the new identity by name."
"We built a design system" → "New feature shipping time is down from six weeks to three. Engineering rework on visual issues is down 70%."

The trick is that the right-hand side is true. You can't translate what you didn't measure. Which is why the framework starts where it does — with the business question, picked early, in plain language, before anyone draws a screen.

When the data lies

Two failure modes worth watching for.

The local optimum trap: a metric improves because you optimised for it, but the broader outcome got worse. A signup flow with a higher conversion rate that also produces customers who churn in week two. A homepage with longer dwell time made up of confused readers. The cure is to measure at least one rung up from where you are optimising — and have the discipline to listen when it disagrees.

The laundered correlation: a metric moved at the same time you shipped a thing, but the metric moved because of something else entirely. Seasonality, a price change, a marketing campaign, a competitor pulling out. The cure is to be more sceptical of small wins than of small losses. Losses tend to be honest. Wins are easy to believe and worth checking twice.

The compounding effect

A team that measures the right things gets a compounding effect that an unmeasured team doesn't. Every project teaches the next one. Decisions get faster because the reasoning is reusable. The argument for the design budget makes itself in the next planning cycle. The craft survives the spreadsheet, because the spreadsheet is now on its side.

That is the case for measurement. Not because design needs to prove itself — it doesn't, the world is full of evidence — but because the studios that take the question seriously get to keep doing the work.

How to measure design.

Start with the business question, not the metric

The three altitudes of design measurement

Product altitude — task and flow

System altitude — the design system itself

Brand altitude — the long horizon

Quant alone is a trap

What to ignore

Talking about it in the boardroom

When the data lies

The compounding effect

AI inside UX/UI.

Prototyping for the enterprise.

Some of our best projects
started with a two-line email.

Start with the business question, not the metric

The three altitudes of design measurement

Product altitude — task and flow

System altitude — the design system itself

Brand altitude — the long horizon

Quant alone is a trap

What to ignore

Talking about it in the boardroom

When the data lies

The compounding effect

AI inside UX/UI.

Prototyping for the enterprise.

Some of our best projectsstarted with a two-line email.

AI inside UX/UI.

Why design and engineering shouldn't be two teams.

Prototyping for the enterprise.

Some of our best projects
started with a two-line email.