There is a suggestion that only the final target of a project's performance is needed to steer toward success. This target can be budget, a finish date, the number of stories or story points in an agile software project. With the target and the measure of performance to date, collected from the measures at each sample point, there is still a missing piece needed to guide the project.
With the target and the samples, no error signal is available to make intermediate corrections to arrive on target. With the target alone, any variances in cost, schedule, or techncial performance can only be discovered when the project arrives at the end. With the target alone, this is an Open Loop control system.
Irreducible Uncertainty can only be handled with Margin. Cost margin, schedule margin, technical margin. This is the type of margin you use when you drive to work. The GPS Navigation system says it 23 ninutes to the office. It's NEVER 23 minutes to the office. Something always interferes with our progress.
Reducible Uncertainty is handled in two way. Spending money to buydown the risk that results from this uncertainty. Management Reserve (budget reserve and schedule contingency) to be used whenm soemthnig goes wrong to pay for the fix when the uncertainty turns into reality.
The next figure (page 28) shows how to manage in the presence of these uncertainties, by measuring actual performance against the desired performance at each step along the way.
In this figure, we measure at each assessment point the progress of the project against the desired progress - the planned progress, the needed progress. This planned, desired, or needed progress is developed by looking at the future effort, duration, risk, uncertainty - the stochastic processes that drive the project - and determining what should be the progress at this point in time to reach our target on or before the need date, at or below the needed cost, and with the needed confidence that the technical capabilities can be delivered along the way? This is closed loop control.
The planned performance, the needed performance, the desired performance is developed early in the project. Maybe on day one, more likely after actual performance has been assessed to calibrate future performance. This is called Reference Class Forecasting. With this information estimates of the needed performance can then be used to establish steering targets along the way to completing the project. These intermediate references - or steering - points provide feedback along the way toward the goal. They provide the error signal needed to keep the project on track. They are the basis of Closed Loop control.
In the US, many highways have rumble strips cut into the asphalt to signal that you are nearing the edge of the road on the right. They make a loud noise that tells you - hey get back in the lane, otherwise you're going to end up in the ditch.
This is the purpose of the intermediate steering targets for the project. When the variance between planned and actual exceeds a defined threshold, this says hey, you're not going to make it to the end on time, on budget, or with your needed capabilities if you keep going like this.
Kent Beck's quote is...
Optimism is the disease of software development. Feedback is the cure.
This feedback must have a reference to compare against if it is to be of any value in steering the project to a successful completion. Knowing it's going to be late, over budget, and doesn't work when we arrive at late, over budget, and not working is of little help to the passengers of the project.
These discussions usually start by quoting something from a summary of Little's Law or the Central Limit Theorem.
A critical element of both Little's Law and the CLT is the notion of Identical Independently Distributed (IID) random variables. These variables are the arrival rate to a service - stories selected from the backlog for development, or someone arriving in line at the bank to make a deposit of a check.
Let's start with some math. In probability theory, the central limit theorem (CLT) says,
Using a grocery store cehck-out line or bank teller window example, Little’s law gives the relation between the mean number of customers in the system, E(L), and the mean transit time through the system, E(S), and the average number of customers entering the system per unit time, λ, as E(L) = λE(S).
An Actual Example In Preparation for Developing Software From The Story Queue
Let's pretend we're in line at the grocery store. We'll call the check-out line the resource and the people lining up at the check-out line the customers. If the clerk manning the check-out station is busy checking out customers, a queue will form in the line waiting to check out.
The population of customers that can use the store is usually finite, but this makes the problem harder, so let's assume for the moment the population of customers is infinite. The number of check-out lines can be one or many, but we'll want to assume they are identical in their services for the moment as well. Let's define the capacity of the store as the number of people that can wait in line, plus the person being served by the clerk. In most stores there will be a finite number of people in the queue at check-out, but again if this number is infinite to makes the analysis easier.
We need another simplifying assumption. The distribution of the amount of time each customer stays at check-out (once they arrive) is Independent and Identically Distributed (IID). As well the probability that a customer will arrive at check-out is also an IID variable. This distribution is usually taken to be exponential. This means the longer you wait, the more likely it is someone will show up at the check-out stand ready to check out.
These are critical assumptions for what follows about Little's Law. If the above conditions are not met, Little's Law is not applicable to the problem being described.
So let's have a quick summary of Little's Law:
Mean number of people in line at check-out = Arrival Rate of customers × Check-Out Time
This law can be applied if some other conditions exist:
Now for Software Development
Instead of assuming if Little's Law can be applied to software development, let's first ask are the conditions right to apply the law:
This means, do the jobs - stories - arriving at the service - development (or some other process) - behave like IID variables. That is they have no knowledge of each other and are indistinguishable from each other and when serviced, they can not be distinguished from the other work serviced(meaning developed, tested, installed, etc.)
Let's look at an actual project, a simple one. We want to fly to the moon for the first time, land, and come home.
Doing work on a development project is not the same as work arriving in a queue of a service, it is a network of dependencies, with interconnections, and most importantly - most critical actually - the duration of the work, the time spent in the service - are not independent, identically distributed random variables. A network of work looks like this (notionally).
So does Little's Law apply? Nope!
These networks of work are called Stochastic Networks and are not subject to Little's Law. The proicess in the Little's Law condition can be stochastic, but there has to be independence between the work elements and they have to be identically distirbuted probability distributions.
Production queues of parts going down an assembly line are. Cards being pulled from the the tray in a Kanban furniture manufacturing system are. The notional Kanban system in agile development are - but If and Only IF (IFF) the work pulled from the wall is independent from all other work, and the probability distribution of the durtaion of that work independently distributed from the actual work as well.
If you can find a project where all the features are independent from each other, there work efforts are identical, independently distributed random variables, and you'll be able to apply Little's Law.
Little's Law applies to software development work that looks like productioni flow - like the assembly line at Toyota, or the office furniture production line we designed at a factory in Idaho.
But those types of software projects must be intentionally designed to have no dependencies between the work performed, have the duration of the work in the service cycle (development) have no dependency on the prior work or the work.
This is the condition of Identical Independent Distribution (IID) needed for Little's Law as well as the Central Limit Theorem. So befoe anyone says Little's Law applies to software development, they need to show these conditions exist.
One Final Observation
The slicing proposed by some in the agile community might create the conditions for Little's Law to work. But the effort to slice the stories into equal sized - or at least Independent Identically Distributed work sizes for the entire project duration seems like a lot of work. Especially when there are much easier ways to estimating the total work, total duration, and total cost.
But since this slicing paradigm appears to be anecdotal and untested across a wide variety of projects, domains, and sizes, the population sample size condition is unlikely to be met as well.
More research, based on actual analysis needs to be done, then that research reviewed and tested, before the notion of mathematically slicing has much use outside of anecdotal examples.
Resources (of many)
Obvious not every decision we make is based on mathematics, but when we're spending money, especially other people's money, we'd better have so good reason to do so. Some reason other than gut feel for any sigifican value at risk. This is the principle of Microeconomics.
All Things Considered is running a series on how people interprete probability. From capturing a terrortist to the probability it will rain at your house today. The world lives on probabilitic outcomes. These probabilities are driven by underlying statistical process. These statistical processes create uncertainties in our decision making processes.
Both Aleatory and Epistemic uncertainty exist on projects. These two uncertainties create risk. This risk impacts how we make decisions. Minimizing risk, while maximizing reward is a project management process, as well as a microeconomics process. By applying statistical process control we can engage project participants in the decision making process. Making decision in the presence of uncertainty is sporty business and many example of poor forecasts abound. The flaws of statistical thinking are well documented.
When we encounter to notion that decisions can be made in the absence of statistical thinking, there are some questions that need to be answered. Here's one set of questions and answers from the point of view of the mathematics of decision making using probability and statistics.
The book opens with a simple example.
Here's a question. We're designing airplanes - during WWII - in ways that will prevent them getting shot down by enemy fighters, so we provide them with armour. But armor makes them heavier. Heavier planes are less maneuverable and use more fuel. Armoring planes too much is a proplem. Too little is a problem. Somewhere in between is optimum.
When the planes came back from a mission, the number of bullet holes was recorded. The damage was not uniformly distributed, but followed this pattern
The first thought was to provide armour where the need was the highest. But after some thought, the right answer was to provide amour where the bullet holes aren't - on the engines.
- Engine - 1.11 bullet holes per square foot (BH/SF)
- Fueselage - 1.73 BH/SF
- Fuel System - 1.55 BH/SF
- Rest of plane - 1.8 BH/SF
"where are the missing bullet holes?" The answer was onb the missing planes. The total number of planed leaving minus those returning were the number of planes that were hit in a location that caused them not to return - the engines.
The mathematics here is simple. Start with setting a variable to Zero. This variables is the probability that a plane that takes a hit in the enginer manages to staty in the air and return to base. The result of this analysis (pp. 5-7 of the book) can be applied to our project work.
This is an example of the thought processes needed for project management and the decision making processes needed for spending other peoples money. The mathematician approach is to ask what assumptions are we making? Are they justified? The first assumption - the errenous assumption - was tyhat the planes returning represented were a random sample of all the planes. If so, the conclusions could be drawn.
In The End
Show me the numbers. Numbers talk BS walks is the crude phrase, but true. When we hear some conjecture about the latest fad think about the numbers. But before that read Beyond the Hype: Rediscovedring the Essence of Management, Robert Eccles and Nitin Nohria. This is an important book that lays out the processes for sorting out the hype - and untested and liley untestable conjectures - from the testable processes.
The presentation Dealing with Estimation, Uncertainty, Risk, and Commitment: An Outside-In Look at Agility and Risk Management has become a popular message for those suggesting we can make decisions about software development in the absence of estimates.
The core issue starts with first chart. It shows the actual completion of a self-selected set of projects versus the ideal estimate. This chart is now in use for the #NoEstimates paradigm as to why estimating is flawed and should be eliminated. How to eliminate estimates while making decisions about spending other peoples money is not actually clear. You'll have to pay €1,300 to find out.
But let's look at this first chart. It shows the self-selected projects, the vast majority completed above the initial estimate. What is this initial estimate? In the original paper, the initial estimate appears to be the estimate made by someone for how long the project would take. No sure how that estimate was arrived at - the basis of estimate - or how was the estimate was derived. We all know that subject matter expertise is the least desired and past performance, calibrated for all the variables is the best.
So Here in Lies the Rub - to Misquote from Shakespeare's Hamlet
The ideal line is not calibrated. There is no assessment if the orginal estimate was credible or bogus. If it was credible, what was the confidence of that credibility and what was the error band on that confidence.
This is a serious - some might say egregious - error in statistical analysis. We're comparing actuals to a baseline that is not calibrated. This means the initial estimate is meaningless in the analysis of the variances without an assessment of it accuracy and precision. To then construct a probability distribution chart is nice, but measured against what - against bogus data.
This is harsh, but the paper and the presentation provide no description of the credibility of the initial estimates. Without that, any statistical analysis is meaningless. Let's move to another example in the second chart.
The second chart - below - is from a calibrated baseline. The calibration comes from a parametric model, where the parameters of the initial estimate are derived from prior projects - the reference class forecasting paradigm. The tool used here is COCOMO. There are other tools based on COCOMO and Larry Putman's and other methods that can be used for similar calibration of the initial estimates. A few we use are QSM, SEER, Price.
One place to start is Validation Method for Calibrating Software Effort Models. But this approach started long ago with An Empirical Validation of Software Cost Estimation Models. All the way to the current approaches of ARIMA and PCA forecasting for cost, schedule, and performance using past performance. And current approaches, derived from past research, of tuning those cost drivers using Bayesian Statistics.
The issue of software management, estimates of software cost, time, and performance abound. We hear about it every day. Our firm works on programs that have gone Over Target Baseline. So we walk the walk every day.
But when there is bad statistics used to sell solutions to complex problems, that's when it becomes a larger problem. To solve this nearly intractable problem of project cost and schedule over run, we need to look to the root cause. Let's start with a book Facts and Fallacies of Estimating Software Cost and Schedule. From there let's look to some more root causes of software project problems. Why Projects Fail is a good place to move to, with their 101 common casues. Like the RAND and IDA Root Cause Analysis reports many are symptoms, rather than root causes, but good infomation all the same.
So in the end when it is suggested that the woo's of project success can be addressed by applying
Ask a simple question - is there any tangible, verifiable, externally reviewed evidence for this. Or is this just another self-selected, self-reviewed, self-promoting idea that violates the principles of microeconomics as it is applied to software development, where:
Economics is the study of how people make decisions in resource-limited situations. This definition of economics fits the major branches of classical economics very well.
Macroeconomics is the study of how people make decisions in resource-limited situations on a national or global scale. It deals with the effects of decisions that national leaders make on such issues as tax rates, interest rates, and foreign and trade policy, in the presence of uncertainty
Microeconomics is the study of how people make decisions in resource—limited situations on a personal scale. It deals with the decisions that individuals and organizations make on such issues as how much insurance to buy, which word processor to buy, what features to develop in what order, whether to make or buy a capability, or what prices to charge for their products or services, in the presence of uncertainty. Real Options is part of this decision making process as well.
Economic principles underlie the structure of the software development life cycle, and its primary refinements of prototyping, itertaive and incremental development, and emerging requirements.
If we look at writing software for money, it falls into the microeconomics realm. We have limited resources, limited time, and we need to make decisions in the presence of uncertainty.
In order to decide about the future impact of any one decision - making a choice - we need to know something about the furture which is itself uncertain. The tool to makes these decisions about the future in the presence of uncertainty is call estimating. Lot's of ways to estimate. Lots of tools to help us. Lots of guidance - books, papers, classrooms, advisers.
But asserting we can in fact make decisions about the future in the presence of uncertainty without estimating is mathematically and practically nonsense.
So now is the time to learn how to estimate, using your favorite method, because to decide in the absence of knowing the impact of that decision is counter to the stewardship of our customers money. And if we want to keep writing software for money we need to be good stewards first.
When there are charts showing an Ideal line or a chart of samples of past performance - say software delivered - in the absence of a baseline for what the performance of the work effort or duration should have been, was planned to be, or even better could have, this is called Open Loop control.
The issue of forecasting the Should, Will, Must cost problem has been around for a long time. This work continues in DOD, NASA, Heavy Construction, BioPharma, and other high risk, software intensive domains.
When we see graphs where the baseline to which the delays or cost overages are compared and those baselines are labeled Ideal, (like the chart below), it's a prime example of How to LIe With Statistics, Darrell Huff, 1954. This can be over looked in an un-refereed opinion paper in a IEEE magazine, or a self-published presentation, but a bit of homework will reveal that charts like the one below are simply bad statistics.
This chart is now being used as the basis of several #NoEstimates presentations, which further propagates the misunderstandings of how to do statistics properly.
Todd does have other papers that are useful Context Adaptive Agility is one example from his site. But this often used and misused chart is not an example of how to properly identify problems with estimates,
Here's some core issues:
Here's where the process goes in the ditch - literally.
We can use the ne plus ultra put-down of theoretical physicist Wolfgang Pauli's "This isn't right. It's not even wrong." As well the projects were self-selected, and like the Standish Report, self-selected statistics can be found in the How to Lie book
It's time to look at these sort of conjectures in the proper light. They are Bad Statisics, and we can't draw any conclusion from any of the data, since the baseline to which the sampled values are compared Aren't right. They're not even wrong." We have no way of knowing why the sampled data has a variance from the ideal the bogus ideal
So time to stop using these charts and start looking for the Root Causes for the estimating problem.
A colleague (former NASA cost director) has three reasons for cost, schedule, and technical shortfalls
Only the 2nd is a credible reason for project shortfalls in performance.
Without a credible, calibrated, statistically sound baseline, the measurements and the decisions based on those measurements are Open Loop.
You're driving your car with no feedback other than knowing you ran off the road after you ran off the road, or you arrived at your destination after you arrived at your destination.
I'll admit up front I'm hugely biased toward statistical thinking. As one trained in physics and the mathematics that goes with physics, and Systems Engineering and the math that goes with that thinking about statistics is what we do in our firm. We work programs with cost and schedule development, do triage on programs for cost and schedule, guide the development of technology solutions using probabilistic models, assess risk to cost, schedule, and technical performance using probability and statistics, and build business cases, performance models, Estimates To Complete, Estimates At Completion, the probability of program success, the probability of a proposal win, and the probability that the Go-Live date will occur on or before the need date, and be at or below the planned cost, and the probability the mandatory needed capabilities will be there as well.
We use probability and statistics not because we want to, but because we have to. Many intelligent, trained, and educated people in our domain - software intensive systems and the management of projects - find themselves frozen in fear when confronted by any mathematical problem beyond the level of basic arithmetic - especially in the software development domain. The algorithm writers on flight control systems we work with are not actually software developers in the common sense, but are control system engineers who implement their algorithm in Handel-C - so they don't count - at leats not in this sense.
We have to deal with probability and statistics for a simple reason - ever variable on a project is a random variable. Only accountants deal with Point Numbers. The balance in your checking account is not subject to a statistical estimate. The price of General Electric stock in your 401(K) is a random variable. All the world is a non-stationary, stochastic process, and many times a non-linear, non-stationary, stochastic process.
Stochastic processes are everywhere. They are time series subject to random fluctuations. Your heart beat, the stock market, the productivity of your software team, the stability of technical requirements, the performance of the database server, the number of defects in the code you write.
In our software development domain there is an overwhelming need to predict the future. Not for the reason you may think. Because at the same time there is a movement underway to Not Estimate the future. But it turns out this is a need not necessarily a desire. The need to predict - to have some sense of what is going to happen - is based on a few very simple principles of microeconomics
Our natural tendencies are to focus on observation - empirical data - rather than the statistical data that drives the probabilistic aspects of our work.
This approach - the statistical processes and probabilistic outcomes requires we know something about our underlying processes. Our capacity for work, the generated defect rate, the defect fix rate. Without that knowledge the probabilistic answers aren't forth coming and if th ey are forced out in the Dilbert Style management, they'll be bogus at best, and down right lies at worst.
Let's stop here for some critically important points:
So let's look at the core problem of estimating
As humans we are poor at estimating. Fine, does then mean we should not estimate. Hardly. We need to become aware of our built in problem and deal with them. The need for estimating in business is not going away, it is at the heart of business itself and core to all decision making.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which of these is more probable?
1. Linda is a bank clerk.
2. Linda is a bank clerk and an active member of a feminist movement.
In Kahneman's studies, 90% of all respondents selected the latter option. Why? Because the description of Linda and the word feminism intuitively raise an idea of compatibility. Being outspoken and an activist are not usually associated with the job of a bank clerk. Quickly thinking, the bank clerk is a slightly more probable option if she is at least a feminist. However, the second option is horribly wrong because the probability is lower when there are more variables. Statistical probability is always lower when the target group grows – there are more bank clerks who are not active feminists.
So Now What, We've Confirmed We're Bad At Making Estimates
How do we make it better? First, come to realize the good - meaning credible - estimates are part of good business. Knowing the cost of the value delivered is at the core of all business success. Second, is look for the root causes of poor estimating outcomes. These comes in many sizes. But the Dilbert excuse is not an excuse. It's a carton of bad management. So let's drop that right away. If you work for a Dilbert boss or have a Dilbert boss for a customer, not much good estimating is going to do for you. So let's dispense with the charde of trying too.
So what are some root causes of poor estimates:
The numbers that appear in projects — cost, schedule, performance — are all random variables drawn from an underlying statistical process. This process is officially called a non-stationary stochastic process. It has several important behaviours that create problems for those trying to make decisions in the absence of understanding how these processes work in practice.
The first issue is that all point estimates for projects are wrong, in the absence of a confidence interval and an error band on that confidence.
How long will this project take is a common question asked by those paying for the project. The technically correct answer is there is an 80% confidence of completing on or before some date, with a 10% error on that confidence. This is a cumulative probability number collecting all the possible completion dates and describing the cumulative probability - the 80% - of an on or before, since the project can complete before that final probabilistic date as well.
Same conversation for cost. The cost of the project will be at or below "some amount" with a 80% confidence.
The performance of products or services are the third random variables. By technical performance it means anything and everything that is not cost or schedule. This is the wrapper term for the old concept of scope. But in modern terms there are two general purpose categories of Performance with one set of parameters.
These measures are all random numbers with confidence intervals and error bands.
So What's The Point?
When we hear you can't forecast the future, that's not true. The person saying that didn't pay attention in the High School statistics class. You can forecast the future. You can make estimates of anything. The answers you get may not be useful, but it's an estimate all the same. If it is unclear on how to do this, here's a reading assignment for the books we use nearly every month to make our estimates at completion and estimates to complete for software intensive project, starting with the simplist:
While on the topic of books, here are some books that should be on your shelf that put those probability and statistics to work.
There are several tools that make use these principles and practices:
Here's the End
And that's a start in fixing the dysfunction of bad estimating when writing software for money. Start with the person who can actually make a change - You
The book How To Lie With Statistics, Darrell Huff, 1954 should be on the bookshelf of everyone who spends other peoples money for a very simple reason.
Everything on every project is part of an underlying ststistical process. Those expecting that any number associated with any project in any domain to be a single point estimate will be sorely disappointed to find out that is not the case after reading the book.
As well, those expecting to make decisions about how to spend other peoples money will be disappointed to know that statistical information is needed to determine the impact of the decision is influenced by the cost of the decision and the cost of the value obtained by the decision, the impact on the schedule of the work needed to produce the value from that decision, and even the statistical outcomes of the benefits produced by making that decision.
One prime example of How To Lie (although unlikley not a Lie, but just poor application of statistical processes) is Todd Little's "Schedule Estimation and Uncertainty Surrounding the Cone of Uncertainty." In this paper the following figure is illustrative of the How to Lie paradigm.
This figure shows 106 sampled projects, their actual completion and their ideal completion. First let's start with another example of Bad Statistics - the Standish Report - often referenced when trying to sell the idea that software projects are always in trouble. Here's a summary of posts about the Standish Report, which speaks to a few Lies in the How to Lie paradigm.
So let's look at Mr. Little's chart
There is likely good data at his firm, Landmark Graphics, for assessing the root cause of the projects finishing above the line in the chart. But the core issue is the line is not calibrated. It represents the ideal data. That is using the orginal estimate, what did the project do? as stated on page 49 of the paper.
For the Landmark data, the x-axis shows the initial estimate of project duration, and the y-axis shows the actual duration that the projects required.
There is no assessment of the credibility of the initial estimate for the project. This initial estimate might accurately represent the projected time and cost, with a confidence interval. Or this initial estimate could be completely bogus, a guess, made up by uninformed estimators, or worse yet, a estimate that was cooked in all the ways possible from bad management to bad math.
So if our baseline to make comparisons from is bogus from the start, it's going to be hard to draw any conclusion from the actual data on the projects. Both initial estimates and actual measurements must be statistically sound if any credible decisions can be made about the Root Cause of the overage and any possible Corrective Actions that can be taken to prevent these unfavorable outcomes.
This is classic How To Lie - let me present a bogus scale or baseline, then show you some data that supports my conjecture that something is wrong.
In the case of the #NoEstimates approach, that conjecture starts with the Twitter clip below, which can be interpreted as we can make decisions without having to estimate the independent and dependent variables that go into that decision.
So if, estimates are the smell of dysfunction, as the popular statement goes, what is the dysfunction? Let me count the ways:
So next time you hear estimates are the smell of dysfunction, or we can make decisions without estimating:
When it is said that we can't forecast or estimate, it brings a smile. Since in fact forecasting and estimating is done all the time. Not always correctly, and not always properly used once the estimate is made, but done all the same, every day in some domains, every week and every month in the domains I work.
In our domains the Estimate At Complete is submitted to the customer every month. And the Estimate At Completion quarterly on most projects we work. These are software intensive projects and some time software only projects. All innovative development, sometimes never been done before, sometimes inventing new physics.
Some of these estimates are very formal, using tools, reference class forecasting, Autoregressive Integrated Moving Average (ARIMA) projections of risk adjusted past performance and compliance with System Engineering Measures of Effectiveness (MOE) and Performance (MOP), traceable to Technical Performance Measures (TPM) and Key Performance Parameters (KPP). Some are simple linear projects of what it will cost give a few parameters - the is it bigger than a bread box type estimates. Here's how to estimate any software deliverable in an informal way.
At last week's ICEAA conference where a colleague and I presented two papers. Cure of Cost and Schedule Growth and Earned Value Management Meets Big Data, along with the briefing deck, we were introduced to this book. It says it's name, you can measure anything.
Chapter 2 opens with a powerful quote
Success is a function of persistence and doggedness and the willingness to work hard for twenty-minutes to make sense of something that most people would give on after thirty seconds - Malcolm Gladwell, Outliers: The Story of Success.
That chapter and others speak to making estimates about the things we want to measure. Along with Monte Carlo Simulation - another powerful estimating tool we use on our programs. The process entering our domain (space and defense) is Bayesian estimates - adding to what we all ready know.
The instinctive Bayesian approach is very simple
So if we hear, we can't forecast the future, estimates are a waste, we can't know anything about the future until it arrives — stop, think about all the estimating and forecasting activities you interact with every day, from the weather, to the stock market, to your drive to work, to the estimated cost of the repainting of your house, or the estimated cost of a kitchen remodel.
Anything can be estimated or forecast. All that has to happen is the desire to learn how. Since the purpose of estimates is to improve the probability of success for the project, the estimates start by providing information to those paying for the project. This is a immutable principle of business
Value is exchanged for the cost of that value. We can't know the value of something until we know it's cost. From the kitchen cabinets, the the garden upgrade, to the software for Medicaid enrollment. It's this simple
ROI = (Value — Cost) / Cost
Probability theory is nothing but common sense reduced to calculation — Pierre-Simon Laplace 1749-1827)
So when you hear we can't forecast the future, or estimates are evil, or we can't know what we need to do until we start doing, focus on the last part of the quote — if you don't apply probability theory and its partner statistics, they are correct and missing that basic common sense. If we apply basic statistically thinking to project management issues, we can calculate the probability of anything. The resulting probability may not have sufficient confidence levels - but we can calculate non the less.
When you hear we can make decisions without estimating the cost, schedule, or capability impacts of those decisions, consider Laplace and the nonsense of that notion. And a recent example of how to do the math for forecast the future behaviour of a project in a specific domain Earned Value Meets Big Data and the annotated briefing of the same paper.
It is remarkable that a science which began with the consideration of games of chance should become the most important object of human knowledge — Pierre Simom Laplace, 1812.
The notion that all project variables are Random Variables is not well understood in many instances, especially in the agile community and those suggesting that estimating cost, schedule, and performance are not needed to make business decisions.
In some agile paradigms, fixed duration sprints mortgage the future by pushing unfinished or un-started features to future sprints resulting in a Minimal Viable Features outcome rather than the Needed Capabilities for the business case or mission success.
While possibly useful in some domains, many domains assume the minimum features are the same as the required features. Without all the required features, the system is non-functional. Management Reserve, schedule margin, and cost margin are needed to proetct those required features, their cost and their schedule from the random behaviours shown below.
When developing products or services using other peoples money, it is incumbent on us to have some understanding of how these random variables behave on their own and interact with each other. This knowledge provides the basis for making decisions about how that money will result in value to those providing the money. In the absence of that knowledge, those providing the money have no way of knowing when the project will complete, how much it will cost when it is complete, and what the probability is of the capabilities produced by the project to meet the needed business, technical, or mission goals. And most importantly how to make decisions based on those behaviours and interactions. They are deciding without the needed information of the consequences of their decisions.
To continue to assume otherwise ignores Laplaces.
I'm speaking at the ICEAA conference here in Denver (not on travel for once) on forecasting the Estimate At Completion (EAC) for large complex programs using Box Jenkins algorith. And the Cure for Unanticipated Growth in EAC.The notion of making estimates of cost, schedule, and performance of something goes back to the beginning of all projects. From the Egyptians to modern times. Projects that bend metal, projects that develop new life forms, projects that write software in exchange for money.
Each and every project on the planet today has three variables. These variables ae not independent. They are coupled in some way. Usually this coupling is non-linear, non-stationary (this means they are evolving) and many times unknown. These variables are cost, schedule, and delivered capabilities.
If these variable on the project are are UNKNOWABLE (Black Swans), your project is in the ditch before it starts. So let's skip that excuse for not estimating the three variables of the project. Another excuse we can dispose of is our clients don't know what they want. If this is the case, someone has to pay to find out what DONE looks like in units of measure meaningful to the decision makers. Don't have this information in some form, any form, that can be used to further the conversation? Your project is a Death March project on day one.
Modeling Random Variables on Projects
If we are managing a project, or a participant on a project, we should know something about the work we have been asked to do, the outcomes of that work, the relationships between the elements of the work - the hull size of a ship and the cost of the propulsion system. Or, the size of the hardware needed to handle the number of users on the systems.
But the first and most important thing to know is all variables on projects are random variables. If you hear we can't estimate because we can't know exactly what the cost, schedule, or capabilities will be, that person may be unaware of the underlying statistical processes of all projects. Projects are not accounting. They are decision making processes. Decision making in the presence of uncertainty. Anyone seeking certainty - a manager, a customer, a provider - is going to be serious disapponted when they discover all the project variables are random variables, with a Mode (Most Likley), a Standard Deviation, and higher order Moments, describing the shape of the Probability Distribution Function.
So here's a simple and straightforward approach to modeling our project in Excel. Let's start with some urban myths about estimating anything:
In The End, It Is This Simple
If you're building products or providing services using someone elses money, you are professionally obligated to provide some sort of understanding of the cost, scehdule, and probability of showing up with the needed capabilities. Since each of those is interconnected in some non-linear, non-stationary manner, with unknown, but knowable correlations between the lowest level work elements, assume a fixed budget and a set of past performance from measurements of work will provide a credible forecast of future performance is naive at best.
This branch of mathematics [probability] is the only one, I believe, in which good writers get results entirely erroneous - Charles Sanders Pierce
When it is said - you can't estimate the future - or we don't know total cost, think of Mr. Pierce. All things project management are probabilistic drive by the underlying statistical processes of irreducible and reducible uncertainty. Rarely, if ever, are these uncertainties Unknowable, in the mathematical sense.
Projects are composed of three fundamental elements. Cost, Schedule, and Technical outcomes. The Technical Outcomes go far beyond the PMI-style scope terms. In this paradigm, the technical outcomes are at the end of a chain. Here's examples of that chain - Capabilities, Measures of Effectiveness, Measures of Performance, Key Performance Parameters (there are 5 in our domain), and Technical Performance Measures. At the TPM level is where things like quality live, traceable to the KPPs.
These three elements are coupled in dynamic ways. Their connections are springy, in that changes in one has an impact on the other two, But rarely is this impact linear and ridgid. The Iron Triangle notion is really a Three Body problem, in which all three element impact each other and at the same time respond to that impact.
All projects have these three elements, coupled in this way. Changes in one impact the other two. Changes in two impact each other and the third. Without knowing the dynamics of cost, schedule, and technical performance, we can't have any credible understanding of these variables.
Three Body Problem
The three body problem determines the possible motions of three point masses m1, m2, and m3, which attract each other according to Newton's law of inverse squares. It started with the perturbative studies of Newton himself on the inequalities of the lunar motion. In the 1740s there was a search for solutions (or at least approximate solutions) of a system of ordinary differential equations by the works of Euler, Clairaut and d'Alembert (with an explanation by Clairaut of the motion of the lunar apogee).
Developed further by Lagrange, Laplace, and their followers, the mathematical theory entered a new era at the end of the 19th century with the works of Poincaré and since the 1950s with the development of computers. While the two-body problem is integrable and its solutions completely understood, solutions of the three-body problem (Java 7 in 64 bit Browser needed) may be of an arbitrary complexity and are very far from being completely understood.
The forces between the bodies can be self attractive or they can be a central force - the restrictive three body problem. Or a combination of the two. This is the basis of complex systems, where multiple forces are applied to objects, which in turn change the forces. As an aside, the double pendulum and the three body problem are used as examples of complex systems. Without acknowledging that the underlying mathematics is deterministic since the Java example above draws the lines from an algorithm.
This is a common mistake by those unable to do the math, or who want to suggest the problems of the day are beyond solution.
Three Body Problem and Three Elements of Project Management
The three body problem uses gravity as the force between the masses. There is a simpler example of three masses connected with three springs. This model is found in chemistry and biology, at the molecular level. Gravity is not in effect of course, but electromagnetic force.
Consider a simplified model for the vibrations of an ozone molecule consisting of three equal oxygen atoms. The atoms are represented by three equal point masses in equilibrium positions at the vertices of an equilateral triangle. They are connected by equal springs of constant k that lie along the arcs of the circle circumscribing the triangle. Mass points and springs are constrained to move on the circle, so that, e.g., the potential energy of a spring is determined by the arc length covered.
Now to Projects
If we assume for the moment that cost, schedule, and technical performance are dynamic variables, with forces between them described by their functional equations. In our functional equations for the force between them ar not constant, but are relationships like this:
The interaction between the three core elements (cost, schedule, technical) is a two way interaction, so the spring analogy is not quite correct, since the spring force doesn't know which end it is pushing or pulling.
It's not the Iron Triangle, it's a springy triangle. The connections are non-linear and most importantly they are probabilistically driven by the underlying statistical processes of the project. Let's start with the picture below.
All project processes are probabilistic. They have behaviours that are not fixed. The notion that you can slice work into same sized chunks and execute these chunks with the same effort would violate the basic aleatory uncertainties of all work processes. With an understanding of the statistical processes, driven by either aleatory or epsitemic uncertainties, is followed by asking probabilistic questions. What's the probability that we'll complete on or before a date or what's the probability we'll compete at or below a cost.
With a probability and statistics foundation, we can now put together a credible plan, driven by the underlying stochastic. All work is connected in dependent ways. The work effort, it's duration, and outcomes is also statistically driven. This picture is typical of such a project.
In The End
So we can:
Todd Little and Steve McConnell use a charting method that collects data from projects and then plots it in the following way. For Little's data its the initial estimated duration versus the actual duration.
and for McConnell's data it's the estimated completon date versus the actual completion date.
So Where's the Rub?
These charts show that project estimates exceed some ideal estimate on a number of projects - the sampled projects. If we were sitting in the statistics class in an engineering, physics, chemistry, biology course, here's some questions that need answers.
What's missing are several things.
The Core Issue with Using Past Numbers
What To Do Next?
The first thing to do is go out to the book store and get a book on statistical forecasting or statistical estimating that has actual math in the book. Next is to ask some hard questions?
Then read all you can find on reference class forecasting and statistical inference. Data is not information. Cause is not correlation.
There's really no way out of this. Spending other peoples money, at least money they are no willing lose, means having some process of estimating the probability of success.
The Final Thought
Plot cost and schedule for your projects asa Joint Probability. Below is a Monte Carlo Simulation of the Joint Cost and Schedule for a program. A similar chart is needed but using a collection of projects. Take Little's and McConnell's sample projects and plot both cost and schedule. There may be correlations between original cost and actual cost, versus original schedule and actual schedule. Big projects have higher risk - restating the obvious by the way. Higher risk project may have wider variances in performance - also restating the obvious.
But these one dimension - one independent variable - plots of cost overrun versus original cost estimates just show the uncalibrated, un-normalized, non-root-cause data. It's just a chart. Of little value for taking corrective action.
There is a post that references a concept I've come to use that puts uncertainty into three classes. This post it not exactly what I said, so let me clarify it is bit.
First some background. I work on an engagement that provides advice to an office inside the Office of Secretary of Defense (OSD). This office, the inside, is responsible for determining the Root Cause of program performance for ACAT1 (Acquisition Category 1) programs.
These are large programs. Larger than $5B. In most domains outside the ACAT1's this numer is ridiculously large. But inside the circle of large defense programs, $5B is really not that much money. Joint Strike in a Congressional Quarterly and the Government Accountability Office indicated a "Total estimated program cost now $400B," nearly twice initial cost. DDG-1000 is $21,214 Million, yes that $21,214,000,000.
No IT or software development project would come within a millionth of that. If you're interested there are reports at Rand and IDA for the current issues. There are certaintly multi-million dollar IT projects. The ACA web site is probably going to be in the range of $85M to several 100 million. The facts are still coming in. So anyone who says they know and doesn't work directly in the program, proably doesn't know and is making up numbers. GAO will get to the real numbers soon we hope.
Principles Rule, Practices Follow, Everything Else is BS
The principles of cost and schedule estimating, assessment of the related technical and programmatic gaps are the same in all domains for every scale. From small to billion. Why? Because it's the same problem no matter the scale.
The soliloquy in the movie makes a good point -handling the truth is actually very difficult for almost everyone outside the domain - in many instances.
We want the simple answer. We want it all to be fine. We really don't want to do the heavy lifting needed to come up with an answer. We want the simple answer. Many times we don't want an answer at all, we want to just do our job and ignore the fuduciary responsibility to tell others what the cost and schedule impacts are, or even to do our job of discovering that DONE looks like before we start spending other peoples money.
So here's the way out of the trap of at least (1) and (2)
But the words used in the original post that referenced my post are not my intent, nor are they part of any process I work in.
Here's a list of other posts on this topic. It's a crtically important topic. One that deserves deatiled analysis. One that we're obligated to know and use when it's not our money we're spending. It's called Governance.
Here's some more discussion on Estimating for fun and profit.
Order of Apparent Chaos - I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expresses by the "Law of Frequency Error." The law would have been personified by the Greeks and deified if they had know of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason - Francis Galton Natural Inheritance (1889).
Wheh there is mention that the future cannot be forecast or estimates of past, present, and future cannot be made, careful consideration must be given to the speakers lack of understanding of basic statistics. One place to start is Principles of Statistics, M. G. Bulmer.
Probability and statistics rule our project world. We must treat all aspects of project work, technical, cost and schedule, as random varaibles drawn from an underlying probabilty distribution - either discrete or continuous. Without considering the random nature of these project elements and their behaviours, our decision making capabilities are severely limited. When we ignore them, fail to consider them, and preceed in their presence, we will be disapointed with the outcomes.
It's that easy and it's that hard. If you don't have a handle on what risks are going to impact your project, those risks will still be there, you just won't know it.
The first step in increasing the probability of project success is to have some notion of what is going to prevent that success. This means asking what can go wrong, rather than what can go right. In order to answer the question what can go wrong we need to know what we are doing. What is the project about? What are we trying to produce? When do we need to produce it? How much money will we need to spend to produce this things call DONE?
Let's start with some obvious risks that we have to handle for any hope of success of the project. These are obvious because they occur on every project, in every domain, using any project management method.
In Concepts of Mathematics, Ian Stewart, there is a math joke that goes like this...
An astronomer, a physicist, and a mathematician (it is said) were holidaying in Scotland. Glancing from a train window, they observed a black sheep in the middle of the field.
How interesting observed the astronomer, all Scottish sheep are black!
To which the physicist responded, No, no some Scottish sheep are black!
The mathematician gazed heavenward in supplication, and then intoned, In Scotland there exists at least one field, containing at least one sheep, at least one side of which is black.
Pick your role here. When I hear words like this can't be done, this has never been done, doing this is evil, doing this is a waste, this is always done, or any other absolute statement that contains never or always, in the absence of a domain, a context in that domain, tangible evidence that the statement is effective outside of a single person's observation, the insistence from the speaker that I've told you this many times over, some evidence from somewhere else, untested beyond opinion, or worse just stated because it sounds like a good idea - as Dilbert has mentioned in the past it looks like it's going to be a long day.