The book How To Lie With Statistics, Darrell Huff, 1954 should be on the bookshelf of everyone who spends other peoples money for a very simple reason.
Everything on every project is part of an underlying ststistical process. Those expecting that any number associated with any project in any domain to be a single point estimate will be sorely disappointed to find out that is not the case after reading the book.
As well, those expecting to make decisions about how to spend other peoples money will be disappointed to know that statistical information is needed to determine the impact of the decision is influenced by the cost of the decision and the cost of the value obtained by the decision, the impact on the schedule of the work needed to produce the value from that decision, and even the statistical outcomes of the benefits produced by making that decision.
One prime example of How To Lie (although unlikley not a Lie, but just poor application of statistical processes) is Todd Little's "Schedule Estimation and Uncertainty Surrounding the Cone of Uncertainty." In this paper the following figure is illustrative of the How to Lie paradigm.
This figure shows 106 sampled projects, their actual completion and their ideal completion. First let's start with another example of Bad Statistics - the Standish Report - often referenced when trying to sell the idea that software projects are always in trouble. Here's a summary of posts about the Standish Report, which speaks to a few Lies in the How to Lie paradigm.
- The samples are self-selected, so we don't get to see the correlation between the sampled projects and the larger population of projects at the firms.
- Those returning the survey for Standish stating they had problems and those not having problems can't be compared to those not returning the survey. And can't be compared to the larger population of IT projects that was not sampled.
- This is a Huff example - limit the sample space to those examples that support you hypothesis.
- The credibility of the original estimate is not stated or even mentioned
- Another good Huff example - no way to test what the root cause of the trouble was, so no way to tell the statistical inference of the suggested solution to the possible corrected outcome.
- The Root Cause of the over budget, over schedule, and less the promised delivery of features is not investigated, nor any corrective actions suggested, other than hire Standish.
- Maybe the developers at these firsm are not very good at their job, and can't stay on cost and schedule.
- Maybe the sampled projects were much harder than first estimated, and the initial estimate was not updated - a new estimate to complete - when this was discovered.
- Maybe management forced the estimate onto the development team, so the project was doomed from day one.
- Maybe those making the estimate had no estimating process, skills, or experience in the domain they were asked to estimate for.
- Maybe a few dozen other Root Causes were in place to create the Standish charts, but these were not seperated from the statistical samples to seek the underlying data.
So let's look at Mr. Little's chart
There is likely good data at his firm, Landmark Graphics, for assessing the root cause of the projects finishing above the line in the chart. But the core issue is the line is not calibrated. It represents the ideal data. That is using the orginal estimate, what did the project do? as stated on page 49 of the paper.
For the Landmark data, the x-axis shows the initial estimate of project duration, and the y-axis shows the actual duration that the projects required.
There is no assessment of the credibility of the initial estimate for the project. This initial estimate might accurately represent the projected time and cost, with a confidence interval. Or this initial estimate could be completely bogus, a guess, made up by uninformed estimators, or worse yet, a estimate that was cooked in all the ways possible from bad management to bad math.
So if our baseline to make comparisons from is bogus from the start, it's going to be hard to draw any conclusion from the actual data on the projects. Both initial estimates and actual measurements must be statistically sound if any credible decisions can be made about the Root Cause of the overage and any possible Corrective Actions that can be taken to prevent these unfavorable outcomes.
This is classic How To Lie - let me present a bogus scale or baseline, then show you some data that supports my conjecture that something is wrong.
In the case of the #NoEstimates approach, that conjecture starts with the Twitter clip below, which can be interpreted as we can make decisions without having to estimate the independent and dependent variables that go into that decision.
So if, estimates are the smell of dysfunction, as the popular statement goes, what is the dysfunction? Let me count the ways:
- The estimates in many software development domains are bogus to start. That'll cause management to be unhappy with the results and lower the trust in those making the estimates. Which in turn creates a distrust between those providing the moeny and those spending the money - a dysfunction
- The management in these domains doesn't understand the underlying statistical nature of software development and have an unfounded desire to have facts about the cost, duration, and probability of delivering the proper outcomes in the absence of the statistical processes driving those processes. That'll cause the project to be in trouble from day one.
- The insistence that estimating is somehow the source of these dysfunctions, and the corrective action is to Not Estimate, is a false trade off - in the same way as the Standish Report saying "look at all these bad IT projects, hire us to help you fix them." This will cause the project to fail on day one again, since those paying for the project have little or no understanding of what they are going to get in the end for an estimated cost if there is one.
So next time you hear estimates are the smell of dysfunction, or we can make decisions without estimating:
- Ask if there is evidence of the root cause of the problem?
- Ask to read - in simple bullet point examples - some of these alternatives - so you can test them in your domain.
- Ask in what domain would not estimating be applicable? There are likley some. I know of some. Let's hear some others.
- Ask to show how Not Estimating is the corrective action of the dysfunction?