It is popular to use the quote Lies, Damned Lies, and Statistics to counter some statement about a probabilistic behaviour. The origin of this phrase remains unknown, but it is popularly attributed to Sir Charles Dilke. Another likely false attribution is many times Benjamin Disraeli, Earl of Beaconsfield (1804–1881). Mark Twain is another false source.
Let's start with the book that should be found on every project managers shelf (along with a long list of other books). Huff's book shows how statistics can be easily misused, misunderstood, and sometimes manipulated to show something that just isn't true. This book is still in print in paperback.
For us project managers, we need to start by understanding that probability and statistics are the life blood of our professional. All the numbers we encounter on projects are in fact random numbers. They are generated by the underlying stochastic processes of how projects work. Projects are collections of interacting work activities. These activities are connected with each other and with the externalities of the project. These externalities start with people. People are random processes looking for something to interact with. Some might say people are random processes looking for something to disrupt. Forecasting the behavior of people is very sporty business. This is one motivation for process. Processes guide or bound the behavior of the random behavior of people. For now let's exclude the random behaviour of people from the conversation.
You see processes creating bounds everyday. The speed limits that create safety bounds. Processes for filling out your application to college or for a car loan. There are also processes for developing products. These usually start with simple paradigms - you give me some money, I spend it to give you something back. You assess the value of that product and give me more money to continue or stop giving me money.
These processes involve several simple variables. People, time, and money. Of course there is the technology, but for now let's also ignore this and assume all the technology is working, non-variable, and not part of the processes we're interested in.
How to Actually Lie with Statistics
Stephen Ross is associate professor of professional practice at Columbia's Graduate School of Journalism. Ross provides 7 Lies that are used daily. All of which we can encounter on projects, from people who work projects, or people who write about projects. Hopefully I'm not one of them:
- Non-response bias or the non-representative bias - this is the self selection bias. What this is really called is missing things on purpose. This is what Standish does. Tell us about all the problems you've had on IT projects. Their sampling doesn't say what the total population of IT projects are. Or most importantly how many successes there have been. This is the newspaper reporting of project problems. DOD IT projects overrun by $1B. On how much total budget? How much did they overrun as a percentage? What was the total value of the projects that overran by $1B. They don't say.
This is a sampling problem. There are simple mathematical processes for determining how big the sample has to be compared to the total population to produce a confidence level from those samples. The more serious problem here is the sample is too small. To get a credible probability distribution we need to know how many samples are needed. There's a formula for that and for good or bad IT projects we need roughly 20 to 30 samples for a population of 100 projects. These are all project, not just projects that answered the call for tell us about your failed project.
It may be a sample of one. In my experience I see that estimating doesn't work. Or In my experience and the experience of the coffee club of similar people I see that agile is the best approach for enterprise IT. - Mistaking statistical association for causality - this is a common mistake. Connecting processes with the outcomes of those processes requires the statistical test of correlation and causality. This starts with a pre-defined hypothesis stating what we should see. Usually the null hypothesis H0. In this instance a statement that can be tested with evidence of the causality of the processes impacts on the outcome.
The recent debacle for the Affordable Care Act web brought out lots of voices on how the problems could have been avoided. Since rarely were any of these voices actually involved in government procurement contracts, nor did they have any actual connection with the project, it's hard to make a correlation with a cause of the problems. In the end root cause analysis is needed to determine what the actual source of the problem was beyond the obvious. And even then, we'll have to wait for the GAO to write it's report. GAO, RAND, and IDA write reports on Root Causes of large program failures. Hopefully the ACA report will come soon. - Poisoned control - the epidemiology of project failures does not exist. Project failure analysis is dominated by opinion and conjecture, many time by firms selling the solution or even individuals selling the solution. This is a serious failing in the profession of project management. Internally many firms have assessment process built around Six Sigma or Lean Six Sigma. In the absence of a framing assumption and a governance framework it is difficult to sort out opinion from fact.
If you adopt this process (my process actually), you'll improve the probability of success. There are some obvious approaches. I am the author of one. Earned Value Management is another. But even then research is needed to confirm the connection between a process and increased success. The Software Engineering Institute conducts surveys for success versus maturity. I'm involved in an assessment of connecting Technical Performance Measures to Earned Value Management to provide a better view to performance management through a DOD office. - Data enhancement - "400 killed on highways over the holidays." 65% of all projects (sampled by self selected process) overrun their budget by 50%. These are examples of data enhancement. Extrapolation is another source of data enhancement. We see big problems in IT projects in this domain, so there must be similar problems in all domains. Or the inverse of the extrapolation I work in a 3 man shop at a commercial landscaping equipment manufacture, so what I have found that works for me will surely work on your $500M ERP roll out project.
- Absoluteness - the use of overwhelming data is a source of amazement to the casual observer. When we have very complex situations reduced to a single number we are being fooled by the data. In exactly the same way we may be fooled by randomness. Many times the unvertainty, range, and complexity of project performance data cannot be separated from the root cause of success or failure. When an assessment is reduced to a single number - like the Standish Report with no variance intervals or confidence on the measurement - the result is unusable.
- Partiality - favorable outcomes are presented by owners of the idea. This is called selling. Independent assessments of the data that support the conjecture are needed before any conclusion can be drawn from the salesman's pitch.
- Bad Measuring Stick - The dollar over run of $500M on a $5B project is small. Big numbers but small percentage. It's a 10% overrun. If you can get to the end of the project with a 10% cost overrun or a 10% schedule overrun, you're a Project Management God. Never listen to the absolutes. Only listen to the percentages. And more importantly, the percentage compared to the population variances.
In the end it's all about discovering the variances in everything we do. No work process is steady. All work processes have built in variances. The uncertainty about cost, time, and technical performance that is naturally occurring is called aleatory uncertainty. It is irreducible. This means you can't do anything about it, you have to have margin to protect your project. The other uncertainty on the project is epistemic which means we can learn more about the uncertainty and reduce it with this new knowledge.
If we're going to forecast what it will cost, when it will be done, and the probability that it will work when we arrive at done, then understanding both these uncertainties is critical. The notion of breaking things down into small chunks, doing the work in a serial manner, and thinking that the variances are some how removed and is not going to happen is not reality, at least in the reality of non-trivial projects. 3 people in the same room working on a list of sticky notes on the wall - maybe. Much beyond that and the laws of statistics is going to come into play.
To be credible project managers we need to understand how the underlying statistics impact the probability of success of our project. Ignoring this doesn't mean it goes away. It just means we'll be suprised by the underlying behaviour created by these stochastic processes.