MYTH BUSTING

What are DORA metrics, and do they still matter?

Brad Hipps

9-20-2024

Over the past decade, one of the most popular frameworks for assessing software team efficiency is the one created by the DevOps Research and Assessment (DORA) program. Here we want to recap these metrics, then take stock of their place for modern software teams.

What are the DORA metrics?

DORA metrics are a set of four markers recommended to software teams to evaluate their deployment process. Each metric is designed to be a meaningful measurement of the health and success of a team at implementing features or building products.

Deployment Frequency (DF)

Deployment frequency is used to measure how often a team or organization is deploying or releasing code to end users. It’s effectively an indicator of how steadily teams deliver value to customers.

By prioritizing deployment frequency, teams are encouraged to break work into the smallest atomic units possible. This should drive a few things:

Greater velocity of delivery. Rather than lumbering, monolithic tasks, teams can burn through bite-sized ones.
Less risk in deployment. It’s much easier to understand the impact of a small change, and much simpler to roll it back as needed.

Depending on the size and scale of your team, in relation to the size of the tasks you are performing, you will have a different idea of what your deployment frequency should look like. DORA have their own benchmarks for deployment frequency:

Elite performers deploy multiple times per day;
High performers deploy between once per day and once per week;
Medium performers deploy between once a week and once a month;
Low performers deploy somewhere between once a month and once every six months.

Of course, these guidelines are just that. Plenty of projects, especially ones with experimental technology, won’t fit to elite or high performance frequency. There’s always a level of intuition and pragmatism required when deciding your deployment frequency target.

Lead Time for Changes (LT)

Lead time for changes is the amount of time it takes for a code change to go from merge to production deployment. Naturally, the shorter the lead time, the better. This means there’s little bottleneck between the time code is ready for deployment, and the actual deployment.

Where deployment frequency is concerned with quantity—throughput—lead time for changes is concerned with speed. As with deployment frequency, DORA provides targets for lead time to changes:

Elite performers take less than one day.
High performers take between one day and one week.
Medium performers take between one week and one month.
Low performers take more than one month.

Since these are generalized benchmarks, all the usual caveats apply. Your mileage may vary, especially in highly regulated industries.

Change Failure Rate (CFR)

The change failure rate is the percentage of production deployments that result in failure requiring immediate follow-on fix. The formula is simple: failed deployments divided by total deployments for the period in question.

Where deployment frequency and lead time for changes are measures of throughput and speed, respectively, change failure rate measures the quality of your throughput and speed. DORA’s benchmarks for change failure rate:

Elite performers: 0-15%
High performers: 16-30%
Medium performers: 31-45%
Low performers: 46%+

Mean Time to Recovery (MTTR)

The mean time to recovery (MTTR) is the measure of how long it takes for your team to recover from a failure that occurred in production. This includes the time it takes for you to detect the issue, to fix it, and even the time it takes to deploy the fix.

The formula here is total downtime due to failures divided by number of failures for the period in question. DORA’s benchmarks for this are:

Elite performers: Less than one hour
High performers: Less than one day
Medium performers: Less than one week
Low performers: More than one week

Note that this is about the downtime that is experienced due to failures. If you run a project where the idea of downtime makes little sense—some blockchains, for example—then this metric may have a slightly musty feel to it.

DORA’s focus is a feature and a bug

Given DORA’s roots in DevOps, it’s natural that the metrics focus on the “last mile” of software delivery: how code makes its way into production. This is plainly material to software teams, and having a way to measure deployment efficiency matters. The popularity of DORA speaks for itself.

If there’s a mistake sometimes made with DORA, it’s in confusing the part for the whole. DORA measures the efficacy of a specific slice of software delivery: deployment. But there’s a lot of work that precedes the merge-to-release phase, and software teams (as well as our business stakeholders) want to understand the health of the overall picture.

What does this whole picture look like? Well, at its most elemental, engineering’s job is to take ideas and turn them into working software. Ideas come in, and as quickly and efficiently as possible, we want to render these ideas in an experience that delights users.

This means at the macro we want to understand:

Demand: how many new work requests are coming in, and how that demand is changing over time.
Speed: how long does it take us to deliver against these requests, not just the deployment piece, but from the time we start work.
Throughput: deployment frequency matters here, of course, but so does the overall quantity of work delivered, and how that changes over time.

For each of these, there are underlying diagnostic metrics to help understand. The DORA metrics contribute to our understanding of what may need attention in our speed and throughput. But there are other diagnostics that matter too. For instance, our speed is impacted as much or more by our flow efficiency as it is our lead time for changes.

(It’s also essential to know how these metrics change over time. For example, knowing your average speed–i.e. cycle time–is, say, nine days is one thing. But what you really want to know is: how does that compare to our history? If last quarter, our cycle time was 18 days, we feel pretty good—we’ve cut it in half! If last quarter, the average cycle time was four days, that produces a different kind of feeling…)

This whole-picture reality is exactly what Socratic instruments. What DORA measures is helpful and useful, as long as we remember the bigger picture is exactly that.