INSIDE SOCRATIC

Stop estimating. Start shipping.

Brad Hipps

3-31-2022

Ron Jeffries, co-founder of XP, originator of story points, has some regrets.

Story points, he writes, “were originally invented to obscure the time aspect, so that management wouldn’t be tempted to misuse the estimates. (I know: I was there when they were invented. I may actually have invented Points. If I did, I’m sorry now.)”

The aim was just. Engineering turns ideas into software. Those ideas require a good deal of discovery-via-development before their true size can be known. Picking dates, in advance of any real development, is best left to fortune tellers. Despite this, it’s rife with bad feelings from stakeholders when those dates go wrong.

But there aren’t many “dateless” organizations. Not at scale. Most of us work in places that need some semblance of a delivery date, a reasonable date range at least, for major initiatives. “Obscuring the time aspect” isn’t a realistic option.

The result?

Few teams have any choice but to generate a SWAG for how long something will take;
Having SWAGed it, we all then still go away to torture ourselves over “durationless” point allocations.

If this strikes you as a worst-of-both-worlds situation, you’re not alone.

Punting on points

Why do we carry on like this? Why do story points, or planning poker, or Fibonacci numbers, etc. etc., persist? Why do we burn acres of engineering time debating coded numbers—numbers that, divorced from time, remain utterly mysterious to the business? (And maybe just as mysterious to us. I don’t know how many times I’ve had a “point” explained to me, and the explanation is always different.)

In talking with hundreds of engineering leaders over the course of building Socratic, one answer emerged.

Inertia.

As in, It’s just the default now. We don’t know what else to do. Woody Zuill, coiner of the #NoEstimates hashtag, puts it this way: estimates persist because they give “a feeling of control. [Estimates] make it easy, not right.”

With Socratic, our feeling is pretty simple. A work system should know how long work takes. It should know what my historical actuals are. And it should use that data to tell me how long, within some reasonable probabilistic range, something is likely to take.

As we built our model for automatic estimates, customers had questions. Did all tasks need to be roughly the same size? What if multiple people work the same task? How did weekend work factor into historical actuals?

When you hear that a model will produce a number, the instinct, rightly, is to interrogate that model. So here’s ours.

A model for automatic estimates

Socratic automatically generates a personalized average duration for every task, based on historical actuals. This means knowing, on average, by person and type of work, how long it will take a task to move from its first work phase to its last.

For each task in Socratic, you just choose whether you think the effort for the task is Average, Less, or More. Think of this as simply indicating how this task compares to others—what is it most like?

With your effort set, Socratic shows a total projected time to complete the task. This projection is based on the historical actuals for the assignee. If there's no task assignee (or if the assignee is new to the organization), we use the average for the workstream:

As to some of the most frequent questions about this model...

Does this mean all tasks must be an equivalent size or complexity?

No. Why? The law of large numbers (LLN). Over the course of enough tasks, the inevitable differences in size/complexity among them basically come out in the wash.

What if multiple people work the same task?

If a task's assignee changes as the task moves across work phases—say, from a developer (phase e.g. "Doing") to a tester (phase e.g. "Testing")—we apply the time in each phase to the respective assignee.

How does weekend work factor into historical actuals?

In our current model, we’re concerned with total elapsed calendar time. That is, if a task begins life on a Monday, and completes the following Monday, the total effort (i.e. elapsed duration) was seven days.

But assuming no work was performed over the weekend, isn’t this “inflating” the effort by two days?

Maybe. Our initial theory was that here again, over enough tasks, any weekend work would come out in the wash. Meaning, sometimes weekend work happens, other times it doesn’t; by looking at total elapsed time you keep things simple. But it may be that weekend work is rare enough (fingers crossed) to exclude those days from historical actuals.

People and work evolve. Your estimates should too.

Aside from the time saved by having estimates created automatically, there’s another benefit to this model: it’s responsive as people and work change.

Consider the example of a new hire. In most cases, the average time it takes that person to complete a task in Month 1 is going to be different than in Month 10. They’ve become more familiar with the people, processes, technologies, and so forth: they’re an integrated team member. These kinds of improvements or maturations simply can’t be accounted for manually, using story points.

Of course, no amount of data science removes the real hard work in estimating—breaking a business request into its logical technical parts. Figuring out all the moving pieces. But this is where you want your engineering brainpower spent. Not in story point debates.

The best predictor for how long something is going to take is knowing how long things like it took previously. And while it’s true that past performance is no guarantee of future result, it beats the hell out of guessing.