It's popular in the agile world and even more popular in the No Estimates paradigm to use the term empirical data as a substitute for estimating future outcomes. And my favorite meme that further confuses the conversation.
Probabilistic forecasting will outperform estimation every time
This of course is "It is not only not right, it is not even wrong."† Probabilistic forecasting IS estimating. Estimating is about the past, present, and future. Forecasting is estimating about the future. I'll save the embarrassment by not saying the name of the #NoEstimates person posting this.
First a definition. Empirical is originating in or based on observation or experience. But we all should know that that data needs to properly represent two sides of the problem, the past and the future.
Let's look at some flawed logic in this empirical data paradigm:
- The past - we took 18 samples from the start of the project till now and calculated the Average number of value and we'll use that as a representative number for the future.
- The future - is the past a proper representation - statistically - of the future?
- It's taken 45 minutes from the driveway to the airport garage the last 5 times I left on Monday afternoon to the remote site.
- What's the probability it will take 45 minutes today?
One more technical detail.
- The flow or Kanban style processes depend on a critical concept - Each random variable that is always present in our project must be Identical and Independently distributed.
- This means each random variable has the same probability distribution as the others.
- This CAN be the case in some situations, but when we are developing software in an emergent environment - not production line - it is unlikely.
So Now Some Issues Of Using Just Empirical Data
The future is emergent in most development work. If it's a production line, and software development is not production, then past performance is a good indicator of future performance. So let's ask some questions before using this past empirical data:
- Is the data in the past properly assessed for variance, stability - stationarity, independence?
- It is now of these, what are the statistical parameters. Especially independence. The notion of INVEST in agile cannot be assumed without a test.
- Is the future going to be stable, stationary, independent and represented by the past?
- What's the uncertainty in the future events?
- What was the uncertainty in the past that was not recognized and influenced the statistics but was not represented?
- What are the irreducible uncertainties in the future - the naturally occurring variances that will need margin?
- What are the reducible uncertainties in the future that must be brought down or have management reserve?
Don't have the answers to these and working a non-trivial project? Our empirical data is not worth much because it doesn't actually represent the future. Might as well guess and stop using the term empirical as a substitute for you know know much of anything about the future.
With those answers we can build a credible model of the future, with interdependencies between the work, probability distribution functions for the statistical behaviors of the work elements and start asking the Killer question:
What's the probability of completing on or before the need date for the work we are producing?
This answer only tells us the probability, not the exact date. So here's the most important point.
- When we have a model, we can test if there is an acceptable probability of success.
- That's all we can do, model, test, assess, model some more.
All decisions about future outcomes in the presence of uncertainty need estimates that are placed in the model and assessed for their applicability.
This is called Closed Loop Statistical Process Control. And that's how non-trivial projects are managed. Low value at risk, no one cares if you estimate or not.
† Which by the way is the situation with most of #NoEstimates conjectures, starting with the willful ignorance of the MicroEconomics of decision making as an opportunity cost process. What will is cost us if we decided by multiple choices in the presence of an uncertain future? That questions can't be answered without making an estimate of that opportunity cost.