I'll admit up front I'm hugely biased toward statistical thinking. As one trained in physics and the mathematics that goes with physics, and Systems Engineering and the math that goes with that thinking about statistics is what we do in our firm. We work programs with cost and schedule development, do triage on programs for cost and schedule, guide the development of technology solutions using probabilistic models, assess risk to cost, schedule, and technical performance using probability and statistics, and build business cases, performance models, Estimates To Complete, Estimates At Completion, the probability of program success, the probability of a proposal win, and the probability that the Go-Live date will occur on or before the need date, and be at or below the planned cost, and the probability the mandatory needed capabilities will be there as well.
We use probability and statistics not because we want to, but because we have to. Many intelligent, trained, and educated people in our domain - software intensive systems and the management of projects - find themselves frozen in fear when confronted by any mathematical problem beyond the level of basic arithmetic - especially in the software development domain. The algorithm writers on flight control systems we work with are not actually software developers in the common sense, but are control system engineers who implement their algorithm in Handel-C - so they don't count - at leats not in this sense.
We have to deal with probability and statistics for a simple reason - ever variable on a project is a random variable. Only accountants deal with Point Numbers. The balance in your checking account is not subject to a statistical estimate. The price of General Electric stock in your 401(K) is a random variable. All the world is a non-stationary, stochastic process, and many times a non-linear, non-stationary, stochastic process.
Stochastic processes are everywhere. They are time series subject to random fluctuations. Your heart beat, the stock market, the productivity of your software team, the stability of technical requirements, the performance of the database server, the number of defects in the code you write.
In our software development domain there is an overwhelming need to predict the future. Not for the reason you may think. Because at the same time there is a movement underway to Not Estimate the future. But it turns out this is a need not necessarily a desire. The need to predict - to have some sense of what is going to happen - is based on a few very simple principles of microeconomics
- It's not our money. If it were we could do with it as we please.
- Those providing the money have a finite amount of money. They alos have a finite amount of time in which to exchange that money for value produced by us, the development team.
- If there were a non-finite amount of money and time, we won't have to talk about things like estimates of when we'll be done, or how much it will cost, or the probability that the produced outcomes will meet the needs of the users.
Our natural tendencies are to focus on observation - empirical data - rather than the statistical data that drives the probabilistic aspects of our work.
This approach - the statistical processes and probabilistic outcomes requires we know something about our underlying processes. Our capacity for work, the generated defect rate, the defect fix rate. Without that knowledge the probabilistic answers aren't forth coming and if th ey are forced out in the Dilbert Style management, they'll be bogus at best, and down right lies at worst.
Let's stop here for some critically important points:
- If we don't have some sense of the underlying processes driving our project, we're in much bigger trouble than we think - we don't know what done looks like in units of measure meaningful to the decision makers
- When will we be done? Approximately? - I don't know - then we're late before we start.
- How much will this cost? Approximately? - I don't know - we're over budget before we start.
- What's the probability we'll be able to deliver all the needed features - minimal features or mandatory features, for a cost and schedule goal? I don't know - this project is going to be a Death March project before we run out of time and money.
- If we don't know our capacity for work, which should be developed from empirical data - we can't make duration and cost estimates.
- Knowing this once the project is going if fine. But it's likely too late to make business decisions needed to start the project.
- The very naive assumption that all the work can be broken down into same sized chunks has not broad evidence, and is likely to be highly domain dependent.
So let's look at the core problem of estimating
As humans we are poor at estimating. Fine, does then mean we should not estimate. Hardly. We need to become aware of our built in problem and deal with them. The need for estimating in business is not going away, it is at the heart of business itself and core to all decision making.
Here's an example from Daniel Kahneman's "Thinking. Fast and Slow",
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which of these is more probable?
1. Linda is a bank clerk.
2. Linda is a bank clerk and an active member of a feminist movement.
In Kahneman's studies, 90% of all respondents selected the latter option. Why? Because the description of Linda and the word feminism intuitively raise an idea of compatibility. Being outspoken and an activist are not usually associated with the job of a bank clerk. Quickly thinking, the bank clerk is a slightly more probable option if she is at least a feminist. However, the second option is horribly wrong because the probability is lower when there are more variables. Statistical probability is always lower when the target group grows – there are more bank clerks who are not active feminists.
So Now What, We've Confirmed We're Bad At Making Estimates
How do we make it better? First, come to realize the good - meaning credible - estimates are part of good business. Knowing the cost of the value delivered is at the core of all business success. Second, is look for the root causes of poor estimating outcomes. These comes in many sizes. But the Dilbert excuse is not an excuse. It's a carton of bad management. So let's drop that right away. If you work for a Dilbert boss or have a Dilbert boss for a customer, not much good estimating is going to do for you. So let's dispense with the charde of trying too.
So what are some root causes of poor estimates:
- Poor understanding of what done looks like - what capabilities do we need and when do we need them. Without this understanding building a list of requirements has no home and the project becomes an endless series of emerging work efforts in an attempt to discover the needed capabilities.
- Capabilities Based Planning is a start
- Systems Engineering is another framework
- Inability to create a model of past performance with statistical behaviour
- All project variables are random variables
- Discover these variables, use the capabilities delivery
- Lack of experience and knowledge in the basic processes of software estimating
- The mistaken belief that the future cannot be discovered before it arrives