Todd Little and Steve McConnell use a charting method that collects data from projects and then plots it in the following way. For Little's data its the initial estimated duration versus the actual duration.
and for McConnell's data it's the estimated completon date versus the actual completion date.
So Where's the Rub?
These charts show that project estimates exceed some ideal estimate on a number of projects - the sampled projects. If we were sitting in the statistics class in an engineering, physics, chemistry, biology course, here's some questions that need answers.
- If you draw the ideal line or the perfect line where forecast equals actual, you don't know WHY this is the case
- Next, you don't know WHY the samples above the line - over budget - are the values you have observed.
What's missing are several things.
- Of the sample projects picked by Little. There are 570 projects and he picked 120. What's the graph look like for the other 450?
- What is the root cause of the over budget or delivery delay?
- The observations can't connect to the root cause.
- Was the original estimate bad?
- Was the project poorly managed?
- How many times was the project rebaselined?
- Were the requirements stable?
- Were there unaddressed risk that came out?
- And a list of another 30 questions?
- In both charts, these and other questions are not addressed. So the charts simply show data gather - possibly self selected data - and plotted as Ordinal numbers.
- Were the projects the same complexity?
- Did they have similar risks?
- Were all managed using the same processes?
- Were these process applied equally effectively
The Core Issue with Using Past Numbers
- Learning to forecast future performance from past performance starts with selecting a representive set of numbers from the past so your Reference Class is applicable to the current work.
- This reference class needs to be calibrated and normalized to be Cardinal numbers rather than Ordinal numbers. These numbers need to be absent any root casues for their values other than the singular measure. This means of the duration is being measured, the reason for this duration must not be coupled to some other hidden cause.
- We overran on cost because our orginal cost estimate was wrong
- We overran on cost because we didn't manage the work well
- We overran on cost because we encountered problems we didn't put into our baseline estimate
- All these issue fail to seperate the independent variables from the dependent variables. The result is a really pretty graph based on data, but no real Information about future performance. Just a collecton of past performance.
What To Do Next?
The first thing to do is go out to the book store and get a book on statistical forecasting or statistical estimating that has actual math in the book. Next is to ask some hard questions?
- Is your data self-selected?
- Is your data seperate from the root causes of its value. That is, is the data space normalized.
- Is the data collected from similar projects? If not, did not normaliz for this condition.
Then read all you can find on reference class forecasting and statistical inference. Data is not information. Cause is not correlation.
- Why does the data look like this? The two charts shows a number of projects that are over the Idea estimates.
- Were the estimates credible to start.
- Were the development condition held constant?
- Were the requirements stable
- Were all the drivers of project performance normalized?
- What are the drivers of project performance in your domain.
- What is the statistical behaviours of these drivers?
There's really no way out of this. Spending other peoples money, at least money they are no willing lose, means having some process of estimating the probability of success.
The Final Thought
Plot cost and schedule for your projects asa Joint Probability. Below is a Monte Carlo Simulation of the Joint Cost and Schedule for a program. A similar chart is needed but using a collection of projects. Take Little's and McConnell's sample projects and plot both cost and schedule. There may be correlations between original cost and actual cost, versus original schedule and actual schedule. Big projects have higher risk - restating the obvious by the way. Higher risk project may have wider variances in performance - also restating the obvious.
But these one dimension - one independent variable - plots of cost overrun versus original cost estimates just show the uncalibrated, un-normalized, non-root-cause data. It's just a chart. Of little value for taking corrective action.