This branch of mathematics [probability] is the only one, I believe, in which good writers get results entirely erroneous - Charles Sanders Pierce

When it is said - you can't estimate the future - or we don't know total cost, think of Mr. Pierce. All things project management are probabilistic drive by the underlying statistical processes of irreducible and reducible uncertainty. Rarely, if ever, are these uncertainties Unknowable, in the mathematical sense.

Projects are composed of three fundamental elements. Cost, Schedule, and Technical outcomes. The Technical Outcomes go far beyond the PMI-style scope terms. In this paradigm, the technical outcomes are at the end of a chain. Here's examples of that chain - Capabilities, Measures of Effectiveness, Measures of Performance, Key Performance Parameters (there are 5 in our domain), and Technical Performance Measures. At the TPM level is where things like quality live, traceable to the KPPs.

These three elements are coupled in dynamic ways. Their connections are springy, in that changes in one has an impact on the other two, But rarely is this impact linear and ridgid. The Iron Triangle notion is really a Three Body problem, in which all three element impact each other and at the same time respond to that impact.

All projects have these three elements, coupled in this way. Changes in one impact the other two. Changes in two impact each other and the third. Without knowing the dynamics of cost, schedule, and technical performance, we can't have any credible understanding of these variables.

Three Body Problem

The three body problem determines the possible motions of three point masses m_{1}, m_{2}, and m_{3}, which attract each other according to Newton's law of inverse squares. It started with the perturbative studies of Newton himself on the inequalities of the lunar motion. In the 1740s there was a search for solutions (or at least approximate solutions) of a system of ordinary differential equations by the works of Euler, Clairaut and d'Alembert (with an explanation by Clairaut of the motion of the lunar apogee).

Developed further by Lagrange, Laplace, and their followers, the mathematical theory entered a new era at the end of the 19^{th} century with the works of Poincaré and since the 1950s with the development of computers. While the two-body problem is integrable and its solutions completely understood, solutions of the three-body problem (Java 7 in 64 bit Browser needed) may be of an arbitrary complexity and are very far from being completely understood.

The forces between the bodies can be self attractive or they can be a central force - the restrictive three body problem. Or a combination of the two. This is the basis of complex systems, where multiple forces are applied to objects, which in turn change the forces. As an aside, the double pendulum and the three body problem are used as examples of complex systems. Without acknowledging that the underlying mathematics is deterministic since the Java example above draws the lines from an algorithm.

This is a common mistake by those unable to do the math, or who want to suggest the problems of the day are beyond solution.

Three Body Problem and Three Elements of Project Management

The three body problem uses gravity as the force between the masses. There is a simpler example of three masses connected with three springs. This model is found in chemistry and biology, at the molecular level. Gravity is not in effect of course, but electromagnetic force.

Consider a simplified model for the vibrations of an ozone molecule consisting of three equal oxygen atoms. The atoms are represented by three equal point masses in equilibrium positions at the vertices of an equilateral triangle. They are connected by equal springs of constant k that lie along the arcs of the circle circumscribing the triangle. Mass points and springs are constrained to move on the circle, so that, e.g., the potential energy of a spring is determined by the arc length covered.

These class of problems are called soft body dynamics. The visible outcome is the rendering of graphical objects that are flexible in movies and games - like three dimensional garments.

Now to Projects

If we assume for the moment that cost, schedule, and technical performance are dynamic variables, with forces between them described by their functional equations. In our functional equations for the force between them ar not constant, but are relationships like this:

Cost is a function of schedule - Time is money in this case.

Schedule is a function of cost - We can buy time, shorten schedule, but at what cost?

Cost is a function of technical performance - Want to got fast? How much will that cost?

Cost is a function of schedule - This is not a 1:1 exchange for the schedule question.

Technical performance as a function of cost - What can we afford to do?

Technical performance as a function of schedule - When can we have fast?

The interaction between the three core elements (cost, schedule, technical) is a two way interaction, so the spring analogy is not quite correct, since the spring force doesn't know which end it is pushing or pulling.

The Point

It's not the Iron Triangle, it's a springy triangle. The connections are non-linear and most importantly they are probabilistically driven by the underlying statistical processes of the project. Let's start with the picture below.

All project processes are probabilistic. They have behaviours that are not fixed. The notion that you can slice work into same sized chunks and execute these chunks with the same effort would violate the basic aleatory uncertainties of all work processes. With an understanding of the statistical processes, driven by either aleatory or epsitemic uncertainties, is followed by asking probabilistic questions. What's the probability that we'll complete on or before a date or what's the probability we'll compete at or below a cost.

With a probability and statistics foundation, we can now put together a credible plan, driven by the underlying stochastic. All work is connected in dependent ways. The work effort, it's duration, and outcomes is also statistically driven. This picture is typical of such a project.

In The End

We need to know all three properties.

We need to know the underlying statistical behaviours. If not go find out or pick a reference class.

We need to speak about confidence intervals, not deterministic value (single point values).

We need to know something about cost. If you don't you can't know value or ROI. It's that simple. It's basic financial management.

So we can:

Learn to estimate.

Understand that those with the money need to know how much money and when to release that money.

Those waiting for the value need to know when they will receive that value.

Those consuming other peoples money, to produce that value, need to have some notion of how time, money, and production of value take place outside their narrow world of it's all about me.

Todd Little and Steve McConnell use a charting method that collects data from projects and then plots it in the following way. For Little's data its the initial estimated duration versus the actual duration.

and for McConnell's data it's the estimated completon date versus the actual completion date.

So Where's the Rub?

These charts show that project estimates exceed some ideal estimate on a number of projects - the sampled projects. If we were sitting in the statistics class in an engineering, physics, chemistry, biology course, here's some questions that need answers.

If you draw the ideal line or the perfect line where forecast equals actual, you don't know WHY this is the case

Next, you don't know WHY the samples above the line - over budget - are the values you have observed.

What's missing are several things.

Of the sample projects picked by Little. There are 570 projects and he picked 120. What's the graph look like for the other 450?

What is the root cause of the over budget or delivery delay?

The observations can't connect to the root cause.

Was the original estimate bad?

Was the project poorly managed?

How many times was the project rebaselined?

Were the requirements stable?

Were there unaddressed risk that came out?

And a list of another 30 questions?

In both charts, these and other questions are not addressed. So the charts simply show data gather - possibly self selected data - and plotted as Ordinal numbers.

Were the projects the same complexity?

Did they have similar risks?

Were all managed using the same processes?

Were these process applied equally effectively

The Core Issue with Using Past Numbers

Learning to forecast future performance from past performance starts with selecting a representive set of numbers from the past so your Reference Class is applicable to the current work.

This reference class needs to be calibrated and normalized to be Cardinal numbers rather than Ordinal numbers. These numbers need to be absent any root casues for their values other than the singular measure. This means of the duration is being measured, the reason for this duration must not be coupled to some other hidden cause.

We overran on cost because our orginal cost estimate was wrong

We overran on cost because we didn't manage the work well

We overran on cost because we encountered problems we didn't put into our baseline estimate

All these issue fail to seperate the independent variables from the dependent variables. The result is a really pretty graph based on data, but no real Information about future performance. Just a collecton of past performance.

What To DoNext?

The first thing to do is go out to the book store and get a book on statistical forecasting or statistical estimating that has actual math in the book. Next is to ask some hard questions?

Is your data self-selected?

Is your data seperate from the root causes of its value. That is, is the data space normalized.

Is the data collected from similar projects? If not, did not normaliz for this condition.

Then read all you can find on reference class forecasting and statistical inference. Data is not information. Cause is not correlation.

Why does the data look like this? The two charts shows a number of projects that are over the Idea estimates.

Were the estimates credible to start.

Were the development condition held constant?

Were the requirements stable

Were all the drivers of project performance normalized?

What are the drivers of project performance in your domain.

What is the statistical behaviours of these drivers?

There's really no way out of this. Spending other peoples money, at least money they are no willing lose, means having some process of estimating the probability of success.

The Final Thought

Plot cost and schedule for your projects asa Joint Probability. Below is a Monte Carlo Simulation of the Joint Cost and Schedule for a program. A similar chart is needed but using a collection of projects. Take Little's and McConnell's sample projects and plot both cost and schedule. There may be correlations between original cost and actual cost, versus original schedule and actual schedule. Big projects have higher risk - restating the obvious by the way. Higher risk project may have wider variances in performance - also restating the obvious.

But these one dimension - one independent variable - plots of cost overrun versus original cost estimates just show the uncalibrated, un-normalized, non-root-cause data. It's just a chart. Of little value for taking corrective action.

There is a post that references a concept I've come to use that puts uncertainty into three classes. This post it not exactly what I said, so let me clarify it is bit.

First some background. I work on an engagement that provides advice to an office inside the Office of Secretary of Defense (OSD). This office, the inside, is responsible for determining the Root Cause of program performance for ACAT1 (Acquisition Category 1) programs.

These are large programs. Larger than $5B. In most domains outside the ACAT1's this numer is ridiculously large. But inside the circle of large defense programs, $5B is really not that much money. Joint Strike in a Congressional Quarterly and the Government Accountability Office indicated a "Total estimated program cost now $400B," nearly twice initial cost. DDG-1000 is $21,214 Million, yes that $21,214,000,000.

No IT or software development project would come within a millionth of that. If you're interested there are reports at Rand and IDA for the current issues. There are certaintly multi-million dollar IT projects. The ACA web site is probably going to be in the range of $85M to several 100 million. The facts are still coming in. So anyone who says they know and doesn't work directly in the program, proably doesn't know and is making up numbers. GAO will get to the real numbers soon we hope.

Principles Rule, Practices Follow, Everything Else is BS

The principles of cost and schedule estimating, assessment of the related technical and programmatic gaps are the same in all domains for every scale. From small to billion. Why? Because it's the same problem no matter the scale.

We didn't know

We didn't do our homework

We ignored what others have told us

We ignored the past performance in the same domain

We ignored the past performance in other domains

We just weren't listening to what people were telling us

Our models of cost and schedule growth were bogus, unsound, did not consider the risks, or we just made them up

We couldn't know

We didn't have enough time to do the real work needed to produce a credible estimate

We didn't have sufficient skills and experience to produce a credible estimate

We didn't understand enough about the problem to have our estimate represent reality

We choose not to ask the right questions

We choose not to listen

We choose not to do our home work. or worse choose not to do our job

Since we're spending other peoples money we've decided it's not our job to know something about how much and when you'll be done to some level of confidence. We'll let someone else do that for us and we'll use their estimates in our work.

We didn't want to know

"You can't handle the truth," as Jack Nicholson character Col. Nathan Jessep's so clearly stated below in the clip for A Few Good Men.

As the politcal risk and conseqeunces of the project increase this process becomes more common.

The soliloquy in the movie makes a good point -handling the truth is actually very difficult for almost everyone outside the domain - in many instances.

We want the simple answer. We want it all to be fine. We really don't want to do the heavy lifting needed to come up with an answer. We want the simple answer. Many times we don't want an answer at all, we want to just do our job and ignore the fuduciary responsibility to tell others what the cost and schedule impacts are, or even to do our job of discovering that DONE looks like before we start spending other peoples money.

So here's the way out of the trap of at least (1) and (2)

We didn't know

Do your homework. Look for reference classes for the work you're doing.

Come up with an estimate based on credible processes. Wide Band Delphi, 20 questions, lots of ways out there to narrow the gap on the upper and lower bounds on the estimate

We couldn't know

Bound the risks with short cycle deliverables.

This is called agile

It's also called good engineering as practiced in many domains, from DOD 5000.02 to small team agile development

We don't want to know

Well there's no way out of those short of being King.

But the words used in the original post that referenced my post are not my intent, nor are they part of any process I work in.

We don't pretend, we can't pretend, we must not pretend to know about the future. Instead of pretending, we use well developed and field proven statistical estimating processes. These are documented in guidance, developed through professional societies (also listed in the link). These are calibrated with Cardinal values that are themsleves statisticvally adjusted.

You can only not know the future if it is unknowable. This may be the case. When it is the case, you need a fail to safe approach. Incremental development is one. Failsafe systems is another when the machine has been produced. Fault-Tolerant System Reliabilty in the Presence of Imperfect Diagnostic Coverageis a small component of a much larger body of knowledge of how to build systems that a robust and do not fail to danger.

The political I don't want to know is far above my pay grade.

Here's a list of other posts on this topic. It's a crtically important topic. One that deserves deatiled analysis. One that we're obligated to know and use when it's not our money we're spending. It's called Governance.

Here's some more discussion on Estimating for fun and profit.

Order of Apparent Chaos - I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expresses by the "Law of Frequency Error." The law would have been personified by the Greeks and deified if they had know of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason - Francis Galton Natural Inheritance (1889).

Wheh there is mention that the future cannot be forecast or estimates of past, present, and future cannot be made, careful consideration must be given to the speakers lack of understanding of basic statistics. One place to start is Principles of Statistics, M. G. Bulmer.

Probability and statistics rule our project world. We must treat all aspects of project work, technical, cost and schedule, as random varaibles drawn from an underlying probabilty distribution - either discrete or continuous. Without considering the random nature of these project elements and their behaviours, our decision making capabilities are severely limited. When we ignore them, fail to consider them, and preceed in their presence, we will be disapointed with the outcomes.

It's that easy and it's that hard. If you don't have a handle on what risks are going to impact your project, those risks will still be there, you just won't know it.

The first step in increasing the probability of project success is to have some notion of what is going to prevent that success. This means asking what can go wrong, rather than what can go right. In order to answer the question what can go wrong we need to know what we are doing. What is the project about? What are we trying to produce? When do we need to produce it? How much money will we need to spend to produce this things call DONE?

Let's start with some obvious risks that we have to handle for any hope of success of the project. These are obvious because they occur on every project, in every domain, using any project management method.

Do you have any notion about what DONE looks like in units of measure meaningful to the decision makers? This is the starting point. If we don't have any notion, in meaningful units of measure, of what done looks like, we won't be able to recognize DONE when it coms through the door, other than we ran out of time and money and what is left has got to be called done.

To start to recognize done, let's write done some capabilities that will be produced by the project for those providing us the money for our work.

A capability is the ability to do something. This something should make an impact on those using this capability. They can get work done. They can learning something new. They can prevent something from happening. Something will change as a result of possessing the capability.

If we know what capabilities we'd like to possess as a result of the project, can we know how much it will cost and how long it will take to produce these capabilities for those who want to use them?

I'm not a big fan of biblical versus applied to project management, but here's one that provides good advice to anyone suggesting we just get started. For which of you, desiring to put up a tower, does not first give much thought to the price, if he will have enough to make it complete? Luke 14:38

So let's test this advice with the inverse. We desire to put up a tower, but let's give no thought to the price of doing that work. Doesn't sound like the basis of success does it.

What's the confidence we'll need on the cost and schedule for our work at the beginning of the project? The simple answer is how much do we have at risk? If we start and it turns out that we don't know what DONE looks like, who much are we willing to risk.

We could spend money to find this out. That's be a good approach. But that expense needs to be included in the total expense for the project and those paying for the project need to acknowledge this. This is the essential basis for Agile projects. The customer is paying you to discover what DONE looks like. If they don't know, someone has to know. And someone has to pay to know.

With that out of the way, what resources are we going to need to produce the capabilities, on the planned time for the planned amount of money? Notice we have a plan. The plan is a strategy. The strategy is a hypothesis. The hypothesis needs to be tested, just like we were taught in High School science class.

Plans are meant to be tested. Plans are not carved in stone.

Schedules some us the order of the work needed to implement the plan.

The order of the work might - and many times should - test that the plan is credible

The notion that plans are somehow fixed, that schedules can't be changed, that commitments are immutable is simply bad project management.

Changes must take place for the project to success. Changes must be managed in the same way all elements of the project are managed - with full understanding of the impact of the change.

With our capabilities, the resources needed to implement the capabilities, the strategy for the implementation, we are ready to explore what can go wrong. This is the Risk Management part of the project processes.

Let's start with Tim Lister's quote Risk Management is How Adults Manage Projects. In Tim's presentation he has several other quotes. The purpose of risk management is to make decisions, not sit around and admire the risks. This can be extended to not sitting around an admiring the dysfunctionsof the project. If you see dysfunction do something about it. Name the dysfunction and name the possible ways for taking corrective action.

So start with Tim's presentation on page 19 and see if there are things that you've heard in the past that simply make no sense and end with page 29. From this you should see that all project work is probabilistic.

Probability and statistics are at the heart of risk management for the key project elements - cost, schedule, and technical performance.

We have to learn how to manage in the presence of the uncertainty that creates the risks to our project success. To ignore these uncertainties or to attempt to wish them away will not work. They are always there.

With our list of capabilities, the plan to deliver them, or resources needed to make this happen, the risk that impact our ability to deliver, we now need a way to measure our progress. The simpliet and best way is to state what physical percent complete we plan to be on some day in the future and when that day comes measure the actual physical percent compete against our planned physical percent complete.

Technically this is the basis of Earned Value Management.

If you're not a fan of EVM, then the process of measuring planned versus actual physical percent complete should still be used. It's simple common sense.

Tell me what you're going to do on the day you plan on doing it, for the cost you plan to spend. Then measure what you actually did.

These plans and measures have statistical variances of course, like all project elements. So set the boundaries on the plans and the measures to be within your tolerance for uncertainty. Not 50%, you can get that by flipping a coin. But somewhere in the range of 75% early in the project and tighter as you move left to right.

An astronomer, a physicist, and a mathematician (it is said) were holidaying in Scotland. Glancing from a train window, they observed a black sheep in the middle of the field.

How interesting observed the astronomer, all Scottish sheep are black!

To which the physicist responded, No, no some Scottish sheep are black!

The mathematician gazed heavenward in supplication, and then intoned, In Scotland there exists at least one field, containing at least one sheep, at least one side of which is black.

Pick your role here. When I hear words like this can't be done, this has never been done, doing this is evil, doing this is a waste, this is always done, or any other absolute statement that contains never or always, in the absence of a domain, a context in that domain, tangible evidence that the statement is effective outside of a single person's observation, the insistence from the speaker that I've told you this many times over, some evidence from somewhere else, untested beyond opinion, or worse just stated because it sounds like a good idea - as Dilbert has mentioned in the past it looks like it's going to be a long day.

When managing projects that are funded by other people's money, we are obligated to know something about the probabilistic outcomes of our work. Without the ability to make forecasts and estimates of these outcomes , we are relegated to the category of labor.

As Project Managers, we must learn to estimate and forecast to some level of confidence. This is not guessing, that is child's play. Choosing not to estimate is also child's play. We must learn to make estimates based in sound statistical and probabilistic principles. Without that, the very notion of Return on Investment is intentionally ignored.

Return on Investment = (Value produced - Cost to Produce) / (Cost to Produce)

It's that simple and it's that hard. The cost and value variables are probabilistic estimates and must be treated in that way. But without knowing either of these, to some degree of confidence, we cannot make informed management decisions.

That is the primary role of project management or engineering development -

Make Decisions based on evidence in the presence of uncertainty

All project elements are actually random variables. Funding, productivity, quality, efficiency, cost, schedule, risk, reliability - all the illities actually. All the independent and dependent variables of any project. The project can be a construction project - pouring concrete and welding pipe. It can be a software project - developing a capability used by an internal IT customer, all the way to a game used by millions on their smart phones. Or a project flying to Mars with intensive technologies of hardware, software, and even inventing new physics.

Walts paper speaks to how statistical process control can improve the software development process. This process does not assume what development method you are using.

Since all the elements of the project interact in some way, many times in a non-linear way, most times stochastically (statistical dependencies), we need to not just acknowledge this but be able to manage in the presence of the uncertainties that result from these underlying stochastic processes. As well these processes may not be stationary, they may change as a function of time, as a function of other changes, or just randomly change - stuff happens.

So what does this mean in practice:

No single point estimate can be credible without its variance being known.

No estimates can be credible without the knowledge of how one estimate is coupledto other elements of the project.

All forecasts of future behavior are probabilistic, driven by the underlying statistical processes of the mechanics of the project. These mechanics can be physical mechanics, the people, processes, and politics of the participants.

Since there are connections between all the elements, the first connection - the one most interesting to those paying for the project are the cost and technical risk connections. I know you may think the produced products are the most important - the production of working software for example. But that software appears from the project through the expenditure of time and money. And without knowing how much time and money is needed to produce that outcome - to some level of confidence - all the modern product development techniques are going to help. Why you say? 'Cause you've got no money.

In the end software intensive projects - enterprise class software intensive projects - are complex systems. Projects where all project participants are in the same room, with the shared vision of the outcome - not so complex. But that's not where the bulk of the IT spend lives. As well the notion - perhaps naive - that complex projects can be broken down into simple projects ignore the core principle of all complex systems. The coupling and cohesion of the system elements may not be known upfront and in fact may not be knowable until it is too late. If we look at the primary driver of project failure - the people problem - we can see that Coupling and Cohesion is the primary source of difficulties.

So In The End

To increase our probability of project success we must learn to manage in the presence of uncertainty. This uncertainty is created by either the lack of knowledge (Epistemic uncertainty) or the statistical uncertainty created by the naturally occurring variances in the processes (Aleatory uncertainty).

Both these uncertainties create risk to the project variables - cost, schedule, technical performance. These uncertainties directly participate in the estimating and forecasting processes designed to seek out possible future behaviour. Attempts to control the behaviour that results from the aleatory uncertainty is fruitless. Muda. A waste of time. The only way to protect against the aleatory uncertainties is with margin. Cost margine, schedule margin, technical margin.

The epistemic uncertainty can be addressed by spending money to buy knowledge. This is the definition of epistemic - Epistemology. We can spend money upfront to reduce the uncertainty - this is the basis of all iterative and incremental development systems, be they Scrum or DOD 5000.02. Or we can hold in reserve, money, time, capacity to address problems when they come true probabilistically.

Pay me know or pay me later.

So Now What?

To have any hope of success we must have some level of confidence that all the probabilistic processes, driven by their underlying statistical processes have some level of understanding. Blindly pulling work off a queue, without knowing the arrival rate to the queue, the throughput of the processes servicing the queue, the quality of the products produced by that servicing, and the acceptance rate of the resulting products by the consumer - naive at best and simply bad management at worst.

Let's start with the book that should be found on every project managers shelf (along with a long list of other books). Huff's book shows how statistics can be easily misused, misunderstood, and sometimes manipulated to show something that just isn't true. This book is still in print in paperback.

For us project managers, we need to start by understanding that probability and statistics are the life blood of our professional. All the numbers we encounter on projects are in fact random numbers. They are generated by the underlying stochastic processes of how projects work. Projects are collections of interacting work activities. These activities are connected with each other and with the externalities of the project. These externalities start with people. People are random processes looking for something to interact with. Some might say people are random processes looking for something to disrupt. Forecasting the behavior of people is very sporty business. This is one motivation for process. Processes guide or bound the behavior of the random behavior of people. For now let's exclude the random behaviour of people from the conversation.

You see processes creating bounds everyday. The speed limits that create safety bounds. Processes for filling out your application to college or for a car loan. There are also processes for developing products. These usually start with simple paradigms - you give me some money, I spend it to give you something back. You assess the value of that product and give me more money to continue or stop giving me money.

These processes involve several simple variables. People, time, and money. Of course there is the technology, but for now let's also ignore this and assume all the technology is working, non-variable, and not part of the processes we're interested in.

How to Actually Lie with Statistics

Stephen Ross is associate professor of professional practice at Columbia's Graduate School of Journalism. Ross provides 7 Lies that are used daily. All of which we can encounter on projects, from people who work projects, or people who write about projects. Hopefully I'm not one of them:

Non-response bias or the non-representative bias - this is the self selection bias. What this is really called is missing things on purpose. This is what Standish does. Tell us about all the problems you've had on IT projects. Their sampling doesn't say what the total population of IT projects are. Or most importantly how many successes there have been. This is the newspaper reporting of project problems. DOD IT projects overrun by $1B. On how much total budget? How much did they overrun as a percentage? What was the total value of the projects that overran by $1B. They don't say.

This is a sampling problem. There are simple mathematical processes for determining how big the sample has to be compared to the total population to produce a confidence level from those samples. The more serious problem here is the sample is too small. To get a credible probability distribution we need to know how many samples are needed. There's a formula for that and for good or bad IT projects we need roughly 20 to 30 samples for a population of 100 projects. These are all project, not just projects that answered the call for tell us about your failed project.

It may be a sample of one. In my experience I see that estimating doesn't work. Or In my experience and the experience of the coffee club of similar people I see that agile is the best approach for enterprise IT.

Mistaking statistical association for causality - this is a common mistake. Connecting processes with the outcomes of those processes requires the statistical test of correlation and causality. This starts with a pre-defined hypothesis stating what we should see. Usually the null hypothesis H_{0}. In this instance a statement that can be tested with evidence of the causality of the processes impacts on the outcome.

The recent debacle for the Affordable Care Act web brought out lots of voices on how the problems could have been avoided. Since rarely were any of these voices actually involved in government procurement contracts, nor did they have any actual connection with the project, it's hard to make a correlation with a cause of the problems. In the end root cause analysis is needed to determine what the actual source of the problem was beyond the obvious. And even then, we'll have to wait for the GAO to write it's report. GAO, RAND, and IDA write reports on Root Causes of large program failures. Hopefully the ACA report will come soon.

Poisoned control - the epidemiology of project failures does not exist. Project failure analysis is dominated by opinion and conjecture, many time by firms selling the solution or even individuals selling the solution. This is a serious failing in the profession of project management. Internally many firms have assessment process built around Six Sigma or Lean Six Sigma. In the absence of a framing assumption and a governance framework it is difficult to sort out opinion from fact.

If you adopt this process (my process actually), you'll improve the probability of success. There are some obvious approaches. I am the author of one. Earned Value Management is another. But even then research is needed to confirm the connection between a process and increased success. The Software Engineering Institute conducts surveys for success versus maturity. I'm involved in an assessment of connecting Technical Performance Measures to Earned Value Management to provide a better view to performance management through a DOD office.

Data enhancement - "400 killed on highways over the holidays." 65% of all projects (sampled by self selected process) overrun their budget by 50%. These are examples of data enhancement. Extrapolation is another source of data enhancement. We see big problems in IT projects in this domain, so there must be similar problems in all domains. Or the inverse of the extrapolation I work in a 3 man shop at a commercial landscaping equipment manufacture, so what I have found that works for me will surely work on your $500M ERP roll out project.

Absoluteness - the use of overwhelming data is a source of amazement to the casual observer. When we have very complex situations reduced to a single number we are being fooled by the data. In exactly the same way we may be fooled by randomness. Many times the unvertainty, range, and complexity of project performance data cannot be separated from the root cause of success or failure. When an assessment is reduced to a single number - like the Standish Report with no variance intervals or confidence on the measurement - the result is unusable.

Partiality - favorable outcomes are presented by owners of the idea. This is called selling. Independent assessments of the data that support the conjecture are needed before any conclusion can be drawn from the salesman's pitch.

Bad Measuring Stick - The dollar over run of $500M on a $5B project is small. Big numbers but small percentage. It's a 10% overrun. If you can get to the end of the project with a 10% cost overrun or a 10% schedule overrun, you're a Project Management God. Never listen to the absolutes. Only listen to the percentages. And more importantly, the percentage compared to the population variances.

In the end it's all about discovering the variances in everything we do. No work process is steady. All work processes have built in variances. The uncertainty about cost, time, and technical performance that is naturally occurring is called aleatory uncertainty. It is irreducible. This means you can't do anything about it, you have to have margin to protect your project. The other uncertainty on the project is epistemic which means we can learn more about the uncertainty and reduce it with this new knowledge.

If we're going to forecast what it will cost, when it will be done, and the probability that it will work when we arrive at done, then understanding both these uncertainties is critical. The notion of breaking things down into small chunks, doing the work in a serial manner, and thinking that the variances are some how removed and is not going to happen is not reality, at least in the reality of non-trivial projects. 3 people in the same room working on a list of sticky notes on the wall - maybe. Much beyond that and the laws of statistics is going to come into play.

To be credible project managers we need to understand how the underlying statistics impact the probability of success of our project. Ignoring this doesn't mean it goes away. It just means we'll be suprised by the underlying behaviour created by these stochastic processes.

Statistics rule our lives. Traffic patterns for our commute to work, probabilistics weather anywhere you live. Your 401(k) earnigs reports that comes every month. For projects all the elements are statistically driven. The productivity of the engineers or developers. The partial testing coverage of sofwtare, hardware, integrated systems. The performance of any product. The efficacy of the product when it is in use. The forecasted cost to develop the products or provide the services. The total duration to produce the needed outcomes of the project. It's all statistics all the time.

To work on or manage projects with any hope of success, we need to know about statistics and the probabilistic outcomes that result. Let's start with a simple picture. All projects - at least ones beyond the simple single team working off a single set of work activities, looks like this. There are a collection of activities, connected with each other in a network. The durations are random numbers. The completed outcomes are randomly compliant with the needed quality, functionality, or capability. Even with high coverage testing, when we assemble parts into a whole new things happen. Things we didn't think would happen. The new system has a probability of working the first time. And that probability is not 100%. (We'll get to the quote on this idea).

Many in the agile community - especially the Kanban community - assume work can be divided into same sized chunks and the people doing the work can process these chunks in same sized time frames, but where I come from that would be considered naive at best. Steady arrivals, steady processes, steady exiting of finished products doesn't even happen on the Toyota assembly line. Let alone in the product development business.

So Now For The Quote

One of the great myths of science is that it is rigorous. More to the point is that the scientific method chops up any problem into small pieces that can be comprehended by the human mind, and an above-average mind at that. Some of the pieces are more rigorous than others, and the reassembly of the whole always requires letting something fall through the cracks. - The Art of Modeling Dynamic Systems, Foster Morrison, Wiley, 1991.

So now let's think about the parts that have been decomposed and reassembled. Are they correlated in some way? Does change in one actually make a change in the other - causation? How can we tell? Start by assuming there is no causation between the parts, even though there is correlation.

So let's look at one more thing about the probability and statistics of projects. The picture below is critical to sorting out many of the misconceptions around how projects behave. Statistical processes drive the behaviour of projects. All the elements of a project are subject to these stochastic processes.

If we know something about a process - maybe by observing it - we can state things about the statistical behaviour of the process. Once we know about the underlying processes, we can make probablistic forecasts of future behaviours. For both the statistical and probabilistic measurement we also need to know the variance on those numbers. This allows us to make estimates of past, current, and future behaviours and make forecasts of future behaviors, both with uncertainty bounds.

Vasco Duarte has a nice presentation about his notion of No Estimates. It's clear and concise and answers the mail for what the heck is No Estimates all about. The answer is - there good ideas for flow based project where work chunks similar sized and arrival rates equal exit rates for service provider (development engine). This is the basis of Little's Law (link below). Can't good wrong here. This by the way is how many processes in many domains work. Workpackages with internal activities performed in the order needed to produce outcomes. Planned and executed by the Work Package team. With the Work Packages on baseline for the Rolling Wave 6-9 months, Work Packages crossing only one accounting period. This of course is not the domain Vasco works in - not surre what that is - but much shorter cycles can be found in many places where agile processes are also found.

Everything in the talk is more or less viable in many domains - decomposing the work into same sized chucks of work, putting this work in a queue of work, servicing the work load in a steady manner. Assuming the capacity for work is constant, AND the number of arrivals on the queue is the same as the number of leaving the queue, then Little's Law holds and you can forecast how long it will take to empty the queue at any point in time.

Notice you can ESTIMATE how long it will take to finish the work in the queue, knowing the length of the queue, the arrival rate and the exit rate. Some in the #NE community assert they only use Little's Law to forecast throughput, but they're leaving 2 of the 3 atributes off the table. Maybe because if they use LL to forecast WIP, exit rates and varainces in service times they'll be doing estimating, and that's not allowed in the No Estimates community. Smily face goes here!

Little's Law say

Using a queuing system consisting of discrete objects (stories), objects arrive at some rate to the system. The system of objects form into a queue (backlog) and receive service (development of the stories) and exit (when story is complete and 100% working).

Little's Law says that under steady state conditions, the avergae number of objects in a queuing system equals the average number of objects leaving the queuing system multiplied by the average time each object spends in the system.

This is one of the obvious statements that can be used to estimate how long it takes to do something.

If we know the service rate, which Vasco has shown later, the arrival rate, which may or may not be under our control, we can know how much Work In Progress there is. And we can forecast how long it will take to complete all the objects waiting in the queue.

The Statistical Process Control notion mentioned on the talk, is by its name Statistical in nature, meaning the arrival rate and the exit rate (as a result of the work being done) each have a probability distribution. As well, SPC assumes - and this must be the case - that a conforming outcome is present. That is no, rework. Or any rework goes back on the queue, and since it is likey to be of different size, while spoil the uniformity. Agile does this well, since working software is one of the conditions for success. But of course agile doesn't speak to the impact of rework, failed quality since that work simply goes back on the queue as another story. This dilutes the performance measures and one of the reasons pure agile is not well matched with Earned Value based performance management processes.

With Little's Law - the structure is laid out with the similar sized work, steady arrival and steady departure, words from my Six Sigma Course work book says one of the uses is:

Estimating Waiting Times: If are in a grocery queue behind 10 persons and estimate that the clerk is taking around 5 minutes/per customer, we can calculate that it will take us 50 minutes (10 persons x 5 minutes/person) to start service. This is essentially Little's law. We take the number of persons in the queue (10) as the "inventory". The inverse of the average time per customer (1/5 customers/minute) provides us the rate of service or the throughput throughput. Finally, we obtain the waiting time as equal to number of persons in the queue divided by the processing rate (10/(1/5) = 50 minutes).

Notice the term estimating - again. The math Vasco is describing is used for Estimating waiting times, throughputs, and departure rates. Here's the first example of redefining a term in common use and then saying we're no Estimating in No Estimates.

The core concepts of the talk are sound, but there are other gaps at the second level:

Measuring the performance of the service in the queuing system (developers) and the arrival rate of similar sized work packages ignores the question of how long will the project take IF we don't know the items in the queue are the "all in" requirements.

If, and this is a big if, you can break down the needed capabilities. I'll avoid calling them requirements, since stories are similar to the elements of the Concept of Operations and Use Cases, then the Little's Law queuing system will be a good forecast - read estimate - of the total time needed to complete.

There is mention of the Standish report, but failure to connect the root cause of the failures. It may or may not have been bad estimates. Since Standish has self-selected samples, it's statistical weak and been discussed many times. But without the root cause, Vasco's statement is just speculation. The notion of self selected uncontrolled samples is simply bad statistics and while Vasco like to use statistics when referring to Deming. It's the Root Cause Analysis that is missing from most populist descriptions of the problem. It is just assumed that the problem can be solved by the solution in hand. In the absence of an actually root cause analysis, your hammer is looking for a nail to hit. This appears to be the role of #NoEstimates. Estimate making is poor, let's not make estimates.

Black swans are misrepresented. BS's are NOT statistical variances, but unknown and likely unknowable probabilities of occurrence. If we have the statisics, we don't have a BS. This is a common misunderstanding of statistical processes found in several communities. BTW BS's are very common in Australia.

When Vasco states In my view it's quite impossible to get projects on time using estimates he may want to look for evidence of how those of us who do show up on time, on budget, with working software deal with the uncertainties and variances that are natural and event based. But that's much like Standish, using self selected ancedotal examples to make a point. In Standish terms to sell consulting. In Vasco's term to sell an idea in the absence of evidence outside his our experience.

Rational Unified is NOT waterfall. Another misstatement. Self selected projects is again simply weak statistics. This is an example of why estimating is done poorly and what Vasco rails again. It's called selection bias. It's in play in talks and estimates of work effort. It's everywhere. It's a natire process of humans and requires serious effort to avoid. Anchoring and Adjustmentis the formal description.

Just as an aside, the mention of Barry Boehm seems a bit off base. Barry is well known in the aerospace, defense, academic circles. Because Vasco hasn't heard of him, means he doesn't work in those domains. Barry was in Building O6 TRW the same time I was. I didn't work with him, I was a lowly software engineer on a radar system, but his reputation echoed in the hall ways. Barry moved to SEI the same time my boss did, when A&D crashed in the 80's. Barry is currently at USC (my alma mater for graduate school - Systems Management), and here's a seminal paper possibly useful for the discussion of estimating in the presence of uncertainty - "Reducing Estimating Uncertainty with Continuous Assessment," Pongtip Aroonvatanaporn, Chatchai Sinthop, Barry Boehm, Center for Systems and Software Engineering, University of Southern California Los Angeles, CA 90089

The factual accuracy of the Columbus story of "challenging a group to stand an egg on its end" is in serious doubt. 15 years earlier a similar story was used to describe the construction of a large dome at Santa Maria de Fliore by Filippo, as told by Martin Gardner (May-June 1996). "The great egg-balancing mystery". Skeptical Inquirer 20(3). By the way, the Skeptical Inquirer is a good place to start when using any popular notion of how something works in any public presenations or writting. We are so condition to actually beleive what we have learned, when in fact it is not true or is a version of the truth that has become truth.

The flat earth analogy is also misinformed. It is the modern misconception that the prevailing cosmological view during the Middle Ages saw the Earth as flat, instead of spherical. According to Stephen Jay Gould, "there never was a period of 'flat earth darkness' among scholars (regardless of how the public at large may have conceptualized our planet both then and now). Greek knowledge of sphericity never faded, and all major medieval scholars accepted the Earth's roundness as an established fact of cosmology." This use of analogies - wrong analogies - undermines the actual usefulness of Little's Law based schedule estimating that is used many places. Vasco needs to a bit more googling before basis the meassage on this notion.

The software crisis quote was from a 1968 NATO report used in the early days of computer science. Edsger Dijkstra's used the term in his 1972 ACM Turing Award Lecture. At that time there were no development tools, no real high level langauges, no formal processes, very weak understanding of the connections between requirements, size, complexity, and poor outcomes. It's like using 45 year behaviour in the public health domain - smoking e.g. - as the basis of today policy.

So while these last few quotes and story serve as a reminder of the failings of the human thought processes, they are likley not atuallly true. Like David Anderson's (Kanban) statement used in the same manner for the Frog siting in a slowly heating water. The frog will jump out when the water reaches a temperature too hot. He will NOT get bolied. I attribute these approaches to the well placed need to tell a story everyone can relate to. But the story rarely has any basis in fact.

This is harsh criticism and likely very annoying to those using this approach to make a point, but if you're going to give advice on how to spend other peoples money, at least get the underlying basis of the reasons straight.

So what should we think of #NoEstimates as explained here?

Understand a bit about queuing theory, Little's Law and beyond. If the size of the work has a variance larger than can be absorbed by the Service, the work will start to become late.

If the Service has a variance larger that the arrival rate variance of the similar sized work, the work will start to slow down and its delivery will be delayed.

So Little's Law is very dependent of knowing with some level of confidence that the arrival rate, size of work, and the capacity for work are capable of sustaining the stability needed in Vasco's talk.

Knowing this request that we can control the arrival rate and size and we can control the throughput of the Service. And if not, be able to estimate the variances to assure the system will remain stable under changes in these three variables.

The numbers used late in the talk don't have variances on them. It may well be that those varainces are within the normal deviation for work of this time and are simply the aleatory uncertainty for the process. This means you can't do anything about and you need schedule and cost margin to protect the delivery date. This is completely lost on many of the software intensive programs we work, especially in ERP. This is why a little bit of math is dangerous.

WE NEED TO ESTIMATE THESE VARIABLES AND THEIR VARIANCES TO HAVE CONFIDENCE THE SYSTEM WILL BE STABLE

Historical data is fine to making the determination of the variables, but then we need to control the arrival rate, size, and capacity for work.

So in the end Vasco's talk is informative, useful, and can be applied in several domains. With the assumptions that arrival rate, size, and work capacity can be defined and the variances known. Good work. But is that the same of Not Making Estimates? Anyone working in the process control business would say either Vasco is redefining the term estimate or pushing the estimating process back upstream to assure the work consumed by the processes meets the constraints of Little's Law.

Here's the Trouble with These Approaches

They're pseudo mathematical - voodoo math some might say - they're based on weak if not wrong analogies - they igniore the condition under which they must perform - and most critically they ignore the mandatory need to have some not to exceed estimate of the funding for the project before it starts. Little's Law based planning system work very well on production lines or development process like production lines. PayPal for example. Maintenance systems for example.

Since there is never a domain, context, or discussion of project authorization processes based on committed funding in governance, we can't really make a determination of the applicability outside of personal anecdotes.

How to move beyond the limitation of No Domain, No Context

The problem poor software development performance is for the most part a bad management problem. A Dilbert boss who has failed to understand the five core tenants of successful projects - (1) what does done look like, (2) do we know how to get to done, (3) do we have enough of what we need to reach done, (4) what impediments will we encounter along the way, and (5) how can we measure phsyical percent complete.

If we don't know what done looks like, we need to start with something we do know about. This is incremental and iterative development of anything. In SW it can be called agile. In building spacecraft (which are software intensive) it is called increasing maturity of the deliverables through progressive elaboration of these deliverables, in construction there is Lean Construction, in Pharma there in progressive development of efficacy of the drug, etc. etc.

If we don't know what our past performance was, either through reference class forecasting or actual past performance, we can't calibrate the needed variables for the queuing system. If we don't have some notion of the underlying statistical processes and the resulting probability distrbutions, we can't have insight into the behaviour of the system and will be surprised by the result. We'll call that a Black Swan, but in fact we were just to lazy to go do out homework.

But we must remember for the approach suggested by Vasco, arrival rate, work size, and capacity for work must be stable and the arrival rate cannot exceed the capacity to exiting the Service. When that is the case, the estimate at completion is available from the system.

So if #NoEstimtes is Vasco's description in the talk, this is very understandable. But estimating the cost before starting is still a need in many domains and estimating the cost at completion requires the stability of the queue and service system. Now we can start to find domains and contexts where this is applicable.

The very first thing to establish is the randomness Taaleb speaks about is not the same randomness we find in project work. His randomness is not the same as the randomness we encounter in building things for money. The Black Swans he speaks of may or may not exist in the financial markets. But Black Swans in project work means you simply have not done your homework. Either because you can't afford to do it - a credible reason. Or you are not capable of doing it - you don't know enough about to domain and context to be making decisions. Or you have simply chosen not to do your homework.

The first case is common. We can't afford to find out what can go wrong. It's actually cheaper to just try it and see. This is a domain I am famailar with in the physics world. It's called experimenting. The second case is really saying we're not qualified to make a decision. The response should be - let's go find someone who is. This is the basis of Reference Class Forecasting. The last case seems to be more common these days. I don't see any value in searching so, I'll just start working and we'll discover what we discover when it comes along. No problem. Your money, spend it as you please. Not your money, maybe not.

Taleb's book by the way is full of rhetoric like this ... "Our minds are not quite designed to understand how the world works, but, rather, to get out of trouble rapidly and have progeny." Really? Tell that to the people at CERN, or the engineers building rendezvous and dock software for ISS, or the process control system running the 787 you may have flown on. All stochastic feedback control systems, who's response to random inputs is to stay inside the white lines.

So care is needed when quoting Taleb from a project point of view. Black Swans are common in Australia. Just not here. Disruptive events in the project world are usually an indication of lack of knowledge - and epistemic uncertainty. These uncertainties can be reduced with new information. This is the root word Epistomology.

The counter to Taleb is Popper. Popper was a skeptic. He proposed a rule for scientific inquiries. Nothing can be verified empirically, he said. Science must not rest on any concept of verification. Instead, it must rest on a process of falsification. There are only two kinds of theories: (1) theories known to be wrong; (2) theories that have not yet been proven wrong — not yet falsified. Putting the matter in context, Popper was rebelling against the growth of science.

So when there is some claim made, how can we show the claim to be false? Without that second element - the falsification test - the claim is just blather.

The notion that we can make decisions about the future without somehow knowing the possible behaviours we will encounter in this future is naive at best and is doomed to fail at worse. Here's some advice from others on this topic.

The only certainty is uncertainty - Pliny the Elder (Gaius Plinius Secundus) - if we some how think the future will emerge and we'll be able to react to it, then the simple finance concept of a nonrecoverable sunk cost will be forced on our projects.

The Public ... demands certainties ... But there are not certainties - Henry Louis Mencken - when we encounter the "Dilbert Boss" and accept that as the norm, then it's time to look for new work, not accept that as our lot in life.

If chance is the antithesis of law, then we need to discover the laws of chance - C. R. Rao (mentor to Ramanujan) - all numbers in project work are random variables. Without knowing the underlying statistical process and the probabilistic outcomes, no credible forecast for future performance is possible. Deciding anything in the absence of probabilistic confidence is simply not possible.

There is only one thing about which I am certain, and this is that there is very little about which one can be certain - W. Sommerset Maugham - uncertainty comes in two types reducible and irreducible. We can only have confidence in the future when we deal with both of those on the project. We can also only have confidence in success for work in the short term - withing our planning horizon. Beyond that uncertainty increases and lowers the probability of success for any decision making.

Obviously, a man's judgement cannot be better than the information on which he bases it - Arthur Hays Sulzberger - to refuse to look into the future, to refuse to make an estimate of the possible outcomes, is to refuse to acknowledge your obligation as a project manager to be the steward of your customers money.

Lest men suspect your tale untrue. Keep probability in view - John Gay, 1688-1732, English Poet - all numbers in the project domain are random numbers and these random numbers are drawn from a probability distribution. Anyone seeking certainty is fooling themselves. So when Dilert bosses are quoted, it may be common, but that person is simply uninformed about the processes driving a project. On the other side anyone claiming the future cannot be forecast is equally uninformed. Both sides are wrong and both sides fail to understand the solution is to use the probability and statistics skills they should have learned in high school to make decisions.

A reasonable probability is the only certainty - Edger Watson Howe, Country Town Savings (1911) - like the quote above all project work is probabilistic. Learn to think, act, and make decisions based on probabilistic thinking.

Our wisdon and deliberation for the most part follow the lead of chance - Michel Eyquem de Montaigne, Essays (1580) - chance drives all decision making. Change is not an unknown random outcome. Chance can be a forecast, drawn from an underlying probability generated by a statistical process. Just learn to do this through probabilty and statistical decision making.

All uncertainty is fruitfull ... so long as it is accompanied by the wish to understand - Antinio Machado, Juan De Mairena (1943) - if you really want to understand the behaviours of dynamic systems, study probability and statistics.

Life is the art of drawing sufficient conclusions from insufficient premises - Samuel Butler, Lord, What is Man? Note Books, (1912) - those who naively assert we cannot know something about the future need to study further how to make decisions in the presence of uncertainty.

What Does All This Tell Us?

When we hear words about our inability to predict the future, make a forecast of something in the future, or make a decision in the presence of uncertainty, we must first assume that person has failed to understand the foundtaions of probability and statistics and how to apply these principles to making decisions. Next we might assume that person doesn't actually want to know how to do this.

When asked to make estimates of future cost, schedule, and technical performance many wince. Many refuse saying we don't make no stink'in estimates here. Well there's a few problems with that approach:

Those with the money need an estimate to get that money, hold on to that money, get more money and explain to those who gave them the money how they are being the proper stewards of that money.

Without an estimate of cost, the value of something cannot be determine with any degree of confidence. This is the olde test driving a car you can't afford problem. I'd sure like to own the new Audi A7, but I can't afford it. Instead before I go shopping (with the actual intent to buy), I make an estimate of what I can afford, using my Credit Union payment calculator, then go looking for cars withint my budget.

With the project underway, those with the money need to know how much more money and time will be needed to complete the project. This is the Estimate To Complete (ETC). With the actual costs in hand, that ETC is used to calculate the Estimate At Completion (EAC). That number should be - or better be in our contracting domain - at or below the Budget At Completion (BAC), which is what the real customer, the Defense Department, provided in bottom right corner of Standard Form 26, showing the contract award amount.

These numbers are Probabilistic estimates - except the contract award value. It's not probabilistic, it's a Not To Exceed (NTE) number.

So when there are objections to making estimates for lots reasons - some real, some very lame - we need to start with the full understanding that all estimates are Probabilistic. In the probabilistic estimating business, the best way to provide a credible number is from prior experience. This is called Reference Class Forecasting. Some might say, well we've not done this before so how can we estimate? There are very few things in the commercial software business that haven't been done before in some way. Don't be lazy, look around, ask some questions, do some home work, decompose the system into bite sized chunks that are recognizable as oh yea that looks familiar. This by the way is called system architecture.

Below is the Bayesian statistics formula which essentially says past is a forecast of the future. Don't whimp out, use common sense. Long ago I had advice from a Booz Allen site partner. We were working a large proposal for a Federal Agency. We had gotten all wrapped around the axle about stratgey, capabilities for the customer, our fancy dancy estimates and system architecture. He announced to everyone on the proposal team look boys its real simple our customer (the agency) has money and we want it. That's the strategy. The people with the money get to say if they want an EAC or ETC, no you.

I've made the mistake of engaging the #NoEstimates crowd in questions about the usefulness of their idea. Not a group that likes to get questions, by the way. They use the old Ron Jeffries approach (which has since passed for him) of well just try it to see if it works for you.

The big question is:

If you been asked to make a decision about something related to software development, say a make buy decision, is making an estimate of the comparative costs or comparative durations a good idea?

Turns out that is not a question that can be asked in that crowd. So I learned my lesson again. In the mean time, this diversion caused me to dig out my slightly tattered edition of Time Series Analysis: Forecasting and Control, Box and Jenkins, 1976. I'm working on a presentation for the Fall Integrated Program Management Conference for the use of time series analysis to forecast the Estimate At Completion using the CPI/SPI numbers at the Control Account Level available in the XML submissions of the Contract Performance Report.

The approach currently used to forecast the EAC wouldn't get a D in any statistical analysis class.

Take all the past performance an accumulate it to a single number

Take the current period as a single number

Forecast the future performance of this number is a linear ratio

Do not adjust this number for the underlying variances of the past

Do not adjust this number of the assured underlying variances of the future

Commit millions and many times billions of dollars on the forecast

Along with the Box book I use, there are of several books for applying R to time series analysis. Here's the list of resources for those interested in this topic.

So when you're asked how much will this cost or when will you be done spending my money for the things I asked you to develop or what's the confidence you'll be ready for the "go live" date or better yet the "launch" date of this project, hopefully youranswer woun't be well you see we don't make estimates on our project work, we just develop small increments of fucntionality till the time and money runs out.

There is a quote from George Box that is used - misused actually - by many populist blogers and authors that goes like this...

All models are, some are useful

Of course this quote is used to avoid asking and answering questions around models, forecasting, and assessment of possible future states of systems, a project being a system.

The actual quote is from Science and Statistics, George E. P. Box, Journal of the American Statistical Association, December 1976, pp. 791-799.

So the actual reading would suggest "all model are wrong" ... and the correct answer cannot be obtained by making the model more elaborate. A simple model is needed.

But how simple? That's always the challenge.

The nonsense that #NoEstimates is the answer is of cource just as foolish as overparameterized and overelaborated models. This of course is lost on both sides of the modeling discussion.

There is a popular quote from George Box about all models are wrong, but some models are useful. This quote is many times used by people suggesting some new or innovative approach to a problem that has been around long time.

While this quote has a pithy ring to it, I'm almost certain those using the quote do so without actually reading the book where is was used.

In the early 1970s econometric models had been constructed for a number of countries using time series data. They were largely static, with responses to change assumed to take place within one period, irrespective of whether it was a year or a quarter.

The time series models of Box and Jenkins stood in stark contrast to these naive and static models. The Box and Jenkins used a single variable in isolation and dynamics played the central role. A few studies compared the two approaches and concluded that univariate time series models provided the more accurate short-term forecasts.

This was a turning point because it implied that dynamics were more important for understanding short-run movements than economic relationships as then understood. The emphasis in time series econometrics therefore shifted from modelling large simultaneous systems to taking account of dynamic interactions.

Box and Jenkins were influential not so much because what they said was new, but because they said it well. Their contribution was to show how the dynamic properties of observed series could be matched to those of theoretical models.

The Project Management Point

Models are a critical component of any credible forecast of cost, schedule, and technical performance. Without these models it is actually Guessing as so many critics of estimating are fond of stating. With credible models, forecasting becomes a tool used to increase the probability of project success.

So next time some self-proclaimed person uses Box's quote, ask if they have his book in the shelf and on what page that quote appears.

There have several rounds of how to use analogies and how not to use analogies in the past few years.

These involved notions like agile is an untended garden. Actually and untended garden and called a weed patch.

Or we can't really stop and develop a strategy, because we're always putting out fires. Actually firemen rarely put out fires. They spend the majority of their time preventing fire with fire safety, inspections, fire inspections. Their job is to never have fires start.

And of course for false analogies of the double pendulumas a stand in for chaos and unpredictability. Since the equations of motion are easily defined - an exercise for any upper division physics student - and a MathLab plugin, you can plot the path of the double pendulum.

And my favorite the attractor analogy, in which it is presumed that some how in chaos theory the attractor attracts. Without understanding that those pretty pictures of the attractor are the result of the underlying equations for the system.

So with that in mind, there is a new book from one of my favorite authors, Douglas Hofstadter, Surfaces and Essences. In the book Hofstadter makes the case for analogy as the fuel for creative thinking. Using Robert Oppenheimer's quote...

whether or not we talk about discovery or of invention, analogy is inevitable in human thought, because we come to new things in science with what equipment we have, which is how we have learned to think, and above all how we learned to think about the relatedness of things.

But as always we need to take care to assure that those analogies we use to expand the conversation, don't violate the laws of physics, gardening, or mathematics.

Cost estimating methods have been around for a long time. The current processes found in agile use a points system, sometimes a Fibonacci series to bin the values of the points.

The challenge with this approach is the estimate in agile is not monetized so we can't really tell if the Total Allocated Budget (TAB) is sufficient for the project at any point in time, unless the capacity for and the quality of the ourcomes is steady - that is Level of Effort.

With the LOE approach, the capacity for work is the critical measurement needed for estimating the cost at completion. As well continuous updating of this capacity for work is needed and correctly done on good agile projects.

But there are other issues with this LOE approach on larger projects:

Do we know the variances in the capacity for work and how those variances will impact the final Cost at Completion?

Do we know the interdependencies between the various work products and how they impact the final cost?

Do we know the Aleatory uncertainty - the natural occurrence that can't be fixed and has to have margin.

Do we know the Epistemic uncertainty - the event based risks that need a risk handling plan?

So the examples like that found at Projects @ Work, don't really consider any of the underlying uncertainties in estimating. Without the next level down - statistically adjusted estimates of the work effort, the capacity for work, the quality of that work, and the interdependencies between those work activities and their products, the simple and maybe even simple minded approaches to estimating have limits to scaling.

This is one of those topics where everyone is right in some way, depending on the domain, context in the domain, and scale of that domain. As agile enters the larger acquisition community, where we're spending other peoples money - maybe 100's of millions of dollars, care needs to be taken when applying un-monetized, non-probabilistic, non-joint probability (cross correlations between work element) and non-stochastic forecasting models. The real world is not that simple.

## Recent Comments