The essence of mathematics is to not make simple things complicated, but to make complicated things simple - Stan Gudder

This notion that estimating is hard, estimates are a waste because they are always wrong, willfully ignores the basic mathematics of making decisions of in presence of uncertainty. The foundation of all decision-making, Probability and Statistics. Without this understanding there can be no credible information provided to the decision makers.

It's popular in the #NoEstimates community to claim Forecasting is not Estimating. Using English Dictionaries, they build a case using logical like this. It's been repeating nearly continuously since the start of the discussion about how to make decisions in the presence of uncertanty without estimating.

Salviati: If you have a number of uniform sized tasks & know velocity you can predict ttc (Time to Complete) Simplicio responds: Indeed...gives us much stronger forecasting ability than estimation

Let's use the Oxford Dictionary of Mathematics, not the High School English Dictionary.

It ain't so much the things we don't know that gets us in trouble. It's the things we know that ain't so - Artemus Ward

Estimate

Point Estimate - A single value of a statistic derived from a sample that is taken as an approximation for the population value of that statistic. For example, the sample mean , x , is a point estimate of the population mean, μ

Interval Estimate - An inference made about the range of values within which a population parameter will lie, drawn from values obtained from a random sample.

Interval Estimate - An interval within which a parameter under study (such as a relative risk ) is stated to lie with a particular degree of confidence, likelihood, or probability based on an analysis of a study or multiple studies. See also confidence interval.

Maximum Likelihood Estimate - The value for an unknown parameter in a model that maximizes the probability of obtaining exactly the data that were observed.

Estimation is used to calculate an unknown value.

Estimation is the calculated approximation of a result.

Forecast

Forecast - Also referred to as prediction. In econometrics, a point forecast is the expected value of the variable of interest, or the dependent variable, conditional on the given values of the exogenous and predetermined variables. An interval forecast is the confidence interval for the point forecast.

Forecast - To predict or estimate a future event or trend.

Forecast - A statistical synthesis of probabilities and expert opinion that attempts to define an outcome either in terms of numbers or actual courses of action.

Sale Forecast - An estimate of future sales volumes and revenue. It is usually based on past trends and takes into account current and future directions.

Forecasting - prediction of future events.

Forecasting is the process of making a forecast or prediction. The terms forecast and prediction are often used interchangeably but sometimes forecasts are distinguished from predictions in that forecasts often provide explanations of the pathways to an outcome.

Forecasting - to calculate or predict (some future event or condition) usually as a result of study and analysis of available pertinent data predict.

Estimating is about the past, present, and future outcomes of a process, a model, or some external observation. We can estimate the number leaves that have fallen and need to be raked come fall, to size how many bags have to be bought at Home Depot (past). We need to estimate the number of people on the Pearl Street Mall right now to assess how long the walk will be to Starbucks from our parking spot (present). We can estimate the number of minutes we'll need to reach our favorite parking spot at Denver International Airport from the office parking lot (future).

Forecasting is Estimating about the future. The weather forecast for tomorrow is 70 degrees and a 30% chance of rain in northern Boulder County (where we live). With this forecast, we can estimate what time we need to tee off to have a high probability of getting all 18 holes of golf on our course in before the rain starts.

One of the escape clauses of #Noestimates is to re-label Forecasting as NOT Estimating, It is forecasting, based on empirical data. Ignoring for the moment the empirical data discussion since ALL data is empirical, otherwise you wouldn't have the data.

Empirical is defined as based on, concerned with, or verifiable by observation or experience rather than theory or pure logic.

And just to beat this horse some more

The origin of the term Empirical starts in the 1560s,fromL.empiricus,from Gk.empeirikos"experienced,"from empeiria"experience,"fromempeiros "skilled,"fromen-"in" + peira"trial,experiment."Originallyaschoolof ancientphysicianswhobasedtheir practiceonexperienceratherthantheory.

Empirical also means observed, factual, experimental experiential, pragmatic, speculative, provisional. This is in comparison for estimating purpose for a theoretical model that produces data, parametric from empirical or theoretical models that produce data. In all cases the data is used to estimate some outcome in the past, present, or future. When that estimate is about the future, it can also be referred to as a Forecast. Weather forecasts, sales forecasts, market forecasts, earnings forecasts. All are estimates about some outcome. The antonym of empirical is theoretical, un-observed (as would be the case for a model), hypothetical, conjectural.

So just the repeat - empirical data is observed and empirical data can be one of the sources of making estimates about the past, present, or future.

Since Scrum is an empirical process, it is also shares its attributes with empirical process control systems. Project management and the management of work is a process control systems. This work - writing software for money - is a process. And as a process it needs to be controlled. The control makes use of empirical data. Empirical Process control systems have three major attributes. [1]

Visibility - the attributes of the process that affect the outcomes are visible and known to the processes or people involved in the control of the process. You can't have a process control system, if you can't see what is being controlled, what the result of the control inputs and outputs are.

Inspection - the various aspects of the process are inspected - sampled - frequently enough so that the variances can be measured frequently enough to keep the desired process under control. Yes, Scrum is a control system for producing the desired software at the desired time.

Adaption - makes use of the data received from the Inspection to adjust the process under control within the desired range of outcomes.

This type of control is found from your home thermostat to the dynamic flight controller used to autonomously rendezvous dock the a vehicle with the International Space Station, to provide corrective actions to keep a project on track toward a planned finish date. It's all the same principle.

So Back To Estimating and Forecasting

No matter how many times the #Noestimates advocates make unsubstantiated claims that Forecasting is not estimating, it's not true.

Estimating can be based on empirical data or theoretical models, or better yet from my accelerator days, theoretical models informed by empirical data - Forecasting are estimates of outcomes in the future.

But forecasts are estimates,period. Anyone claiming otherwise needs to come up with reference materials from control systems books, financial modeling books, statistics books, and something to shows Forecasts are not Estimates about future outcomes. Even one of the founders of eXtreme Programming makes that cocla mammy claim that Forecasts are not estimates. Time to send them back to the High School math class.

[1] Modern Control Systems, Richard Dorf. This is the 5th edition. Mine is the 1st Edition, 1970 from the control systems course needed to write FORTRAN 77 code to control the sampling processes on the particle accelerator for our experiment. Google will find this book and Modern Control Systems, Dorf, in PDF form if you actually want to explore further. In the latter book here's the architecture of a general control system, which can be used to manage software development while spending other people's money, fly a spacecraft to Mars or you Boeing 737 you're riding on to home. Jeff Sutherland knows this from his days of driving around in an Air Force F-4 about the same time I was driving around in an Army CH-47 and his further development of Scrum from John Boyd's Air Force work of the OODA Loop. All project work, is all process control all the time. No way out of it, unless your conjectured method is open loop - then you're not controlling anything, you're just watching it fly into the ditch or crash into the ground. Not usually what your customer or commanding officer woould find desirable behaviour.

Jurgen Appelo used the double pendulum on page 42 of his Management 3.0 book to explain the differences between simple, complicated, ordered, complex, and chaotic. He defines the term Chaotic as “very unpredictable.” Like many of Jurgen’s definitions, they are localized to suit the needs of the story line.

There is no universally excepted definition of chaos. But almost everyone would agree on the following ingredients:

Chaos is an aperiodic long-term behavior in a deterministic system that exhibits sensitive dependence on initial conditions.

In this context, the phrase aperiodic long-term behavior means that the motion does not settle down to a fixed point or a periodic orbit. Since the double pendulum loses energy to the environment, after some time the motion does become periodic and it eventually stops at a stationary fixed point. In this sense it is only the theoretical double pendulum without energy losses that would really be a chaotic system.

Please read that last sentence again. It is critical. As well the “sensitivity” to initial conditions is a parametric measure in itself. The starting angle of the pendulum is one parameter. Low starting angles result in different “sensitivities” than larger starting angles. This is an exercise for students in the introductory classical mechanics class as an undergraduate in Physics.

A deterministic system means that the system has no random or noisy inputs. The irregular behavior is intrinsic and arises from the system’s non-linearity rather than from any noisy driving forces.

Please read this last sentence again. It is critical to understanding the definitions needed to describe the behavior of the double pendulum.

Sensitive dependence to initial conditions means that nearby trajectories separate exponentially fast, i.e. two identical systems set up together in the same way such that the initial conditions are arbitrarily close together will have their trajectories rapidly diverge. To make this more concrete, consider two trajectories, where at some time t the trajectories are at position x(t) and x(t) + d(t), then the statement of chaos would be that d(t) ~ d(0) exp [ L t] , where the average value of L is called the Lyapunov exponent, and if this is positive it means that the two trajectories are quickly separating from each other.

Why is this an issue in the management of agile software projects? Good question?

Management 3.0 - and now #NoEstimates advocates - proffers a solution to a complex problem of managing the development of software. The book, while providing advice to managers on how to manage, mixes pseudo-scientific references and concepts – like the double pendulum – in support of essentially sound staffing and personnel management. I came to the book, through Jurgen’s himself. But on first reading I ran straight into what seemed like a collection of ideas that have no actual basis in fact. The double pendulum is just an example of this approach.

So here's the fix for these conjectures. There's a paper "Distilling Free-Form Natural Laws from Experimental Data," Michael Schmidt and Hod Lipson, Science, Vol. 324 3 April 2009, showing not only the equations of motion for the double pendulum, but a machine that can deduce these equations by observing the double pendulum in motion.

Here’s the core problem. When we can't get the analogies right, what else isn’t right in the foundational principles proposed by those suggesting we can't operate in the presence of uncertainty? If those analogies miss the mark on the underlying principles of these analogies, are the other suggested approached equally flawed? Maybe, maybe not, but for someone like me, trained and experienced in the application of approaches to solving complex problems, many of the fundamental approaches used in the book are simply muddled thinking. It’s too bad. A good editor, with experience in the analogies Jurgen uses could have established that they are just notional, analogies, or possible just anecdotal experiences. Instead Jurgen states them as the foundations of the principles of Management 3.0. In the same way the original posters of #NoEstimates state their case that decisions CAN be made without estimating, when in fact that violates microeconomics, managerial finance, and several other principles.

And of course, this plays directly into the #NoEstimates conjectures, based on even less credibility than Jurgen's management processes - minus the illformed analogies.

There is no principle stated to date by the advocates of #NoEstimates that supports the conjecture that decisions can be made in the presence of uncertainty with estimating the impact on the business of those decisions.

Scrum is an Empirical software development management system

Although the Scrum Guide does not contain that term. Emperical processes by definition contain these elements:

Visibility

Inspection

Adaptation

These attributes did not start with Scrum, they are control system terms

Let's look at a non-software development empirical process control system

We want to design a process control for a hot water system for an industrial process. We have chosen two simple attributes - the level of the hot water feed tank - so we need a pumping system to keep the water level at the desired level. This in turn keeps the head pressure at the desired level for the users of the hot water. The temperature of the water when it arrives at the point of need. This type of closed loop control system is seen on many towns across the US, where Water Towers are used to produce the water pressure at the desired - usually 65 PSI - for the residents, by pumping the ground water to a height of the tank then releasing it on demand for the residents. Let's ignore the temperature aspects for the moment and just focus on keeping the tank full

We would like to model this system before we build it to avoid spending our customers money more than once. There are two simple ways of modeling the system.

An empirical method of step testing the process - for the water tank example an empirical model between the controlled and manipulated variables can be developed by collecting and analyzing process data gathered under controlled open loop conditions by stepping through the manipulated variables.

An analytical methods using material and energy balances - for the water tank example an analytical model can be built using the Mass In and the Mass Out.

The empirical model looks like this.

The water tank level is held constant by equalizing the water flow out and the water flow in. The flow out is changed nm a step wise manner and the level response is observed. A series of step wise changes are made and the response is observed. As the level changes - and acts like an integrating or ramp process - the average time delay and gain (in percent of level change per percent of water added) can be calculated to produce.

The analytical model looks like this.

What Does All This Mean for Sofware Estimating?

Managing software development is a Control System. We have a steering target - the tank level - in the form of needed capabilities, a planned delivery date, and a planned budget for the delivery of the needed capabilities at the needed time. If you have no planned budget, planned delivery date, or needed capabilities, you have an open loop control system - just keep spending till the money runs out or the customer tells you to stop. No need for a control system, no need to estimate, just code, someone else will look after the business side of your spending.

But let's also pretend for a moment that those providing the money have a fiduciary obligation to know something about how much it will cost to produce the needed capabilities (Value) and some need to know when those Capabilities will be ready for use, so they can start earning back their investment. This is standard Business 101 time phased Return on Investment. And let's pretend that the development of the software is not a deterministic system. That there are uncertainties in the system. Uncertainties from lots of sources, some you can control and some you can't.

Then what does a model of the control system look like? It can be empirical or analytical, or a combination of both.

It needs to be both for some simple reasons:

Using empirical data in the water tank control system relies on the physics of the system. Pump water to the top, it builds head pressure to be used to remove water from the tank when the valve is opened. The laws of physics don't change over time from this system.

Using empirical data for managing the development of software - empirical data alone - makes the assumption - A BIG ASS ASSUMPTION - that the future is like the past. Like the water tank example.

This of course is NEVER true in software development. If it were true developing software would be a mechanical process, done by machines. And it's not.

So while empirical data is useful, it's not the only thing needed to manage the project so you can show up at or near the time needed, when at or near the planned cost, and at or near the needed capabilities, so the business can fulfill its business model.

An analytical model is also needed, fed by empirical data.

Both are needed, full understanding that the future is never like the past - and analytical modeling must include the probabilistic aspects of the future - possibly derived from the past.

So when we hear - Scrum is an empirical process, (it is true) but think a bit more on what that means. When we hear #NoEstimates is an empirical process - which it's not because it's open loop, with no steering target (a set point in the control systems paradigm), and no consideration for the probabilistic aspects of writing code for money - someone else's money.

Ask a simple question - if you use the word empirical - do you have a notion of the control system model in which that empirical data is used? Or is the term empirical data just a word used without any meaning to the problem at had.

That problem at hand is how to forecast (estimate in the future) in the presence of uncertainty about things meaningful to the business paying for the software. This means cost, schedule, and technical outcomes. How can that be done using empirical data and NOT call that estimating? It can't!

For some software focused background of control systems see this simple starting point

In the business of estimating in the presence of uncertainty, a useful tool is Bayesian analysis of what we know today to make forecasts or estimates of the future. The Bayesian approach to inference, as well as decision-making and forecasting, involves conditioning on what is known to make statements about what is not known.

Bayesian estimating consider a probability of some outcome in the future as a belief or an opinion. This is different from the frequentist approach of estimating where it is assumed there is a long-run frequency of events. These events could be a cost, an expected completion date, some possible performance parameter. A probability that some value will occur. This is useful when there are long-term frequencies of an occurrence. When that is not the case - for example in project work which may be a unique undertaking - a Bayesian approach to estimating is called for.

Conditioning our decisions on what is known, means making use prior knowledge. This knowledge in the project domain comes from past performance of the parameters of the project. These include cost, schedule, work capacity, technical performance and other variables involved in the planning and execution of the work of the project.

This information is a distinguishing feature of the Bayesian approach to estimating the future. To do this we first need to fully specify what is known and what is unknown about the past and the future. Then what is known in making probabilistic statements about what is unknown.

The Bayesian approach to estimating differs from the traditional approach - frequentist - in that it interprets probability as a measure of believability in an event. That is how confident are we in an event occurring.

For project work Bayesian estimating asks what's the believability that this project will cost some amount or less. Or what's the believability that this project will complete on some date or before. This belief is based on prior information about the question. The assessment of the question is then a probability based on this prior condition.

Where P(A) and P(B) are the probabilities of A and B without regard to each other. And P(A|B) is the conditional probability of observing the event A, given that B is true.And P(B|A) is the conditional probability of observing the event B, given that A is true.

For project work this can be very useful, given we have prior knowledge of some parameter's behaviors and would like to know some probability of that parameters behavior in the future.

This is distinctly different from averaging past behavior and projecting the future behavior. It is also distinctly different from assuming that the past behavior is going to be like the future behavior. This two assumptions of course are seriously flawed but at the same time often used in naive estimating or forecasting.

This Bayesian approach to forecasting or estimating future outcomes is also the basis of machine learning using Markov Chain Monte Carlo Simulation.

When faced with questions like when will we be done or how much will it cost when we are done - and these are normal everyday questions asked by any business that expects to stay in business - then Bayesian modeling can be useful. Along with frequentist modeling and standard Monte Carlo Simulation of the processes that drive the project.

A good starting place for the whole topic of estimating software development is ...

So don't let anyone tell you estimates - good estimates - can't be made for software development projects. They may not know how - because they haven't done their home work - but it is certainty possible and for any non-trivial project, not only possible but important for the success of the project and business that is funding that project.

Complexity and chaos are not the same. Complexity requires a higher degree of order and works against chaos. To construct a complex system, work is required - resisting or decreasing entropy. Chaos increases entropy and is a natural process of the universe. - Alexander B. Alleman, PhD student, Montana State University

Managing projects in the presence of uncertainty requires energy be put into the system to maintain it's equilibrium and stability. How much energy, for how long, in what order, in what type is the role of management. To know the answers to these question of how much, when, where means make estimates of those quantities and the resulting outcomes that reduce chaos.

In other domains, this principle is called the 2^{nd} Law of Thermodynamics. In project work, this principle is also applicable, since the system of work is a dynamic coupled collection of random processes, interacting with each other in non-linear, non-stationary ways.

The Law of Entropy is expressed in this 2nd law. The law predicts that the natural state of all things—from the tiniest atoms to the largest of galaxies—is that of disorder. This means, without appropriate systems or balances in place, everything wants to fall into chaos. The management of this naturally occurring (statistical) - as well as the probabilistic - set of processes is the role of Risk Management. If there no uncertainties on your project the 2nd Law would not be applicable. Since all project work is uncertain there is ALWAY risk associated with those uncertainties. And of course managing in the presence of these uncertainties Mandates making estimates about the impact of managerial actions on the outcomes of those decisions.

The entire Universe and everything in it is a collection of 2nd order non-linear partial differential equations, all obeying the 2nd Law of Thermodynamics.

All project work is random work. There are three core random variables on all projects, shown below. There are sub-variables as well as all the ...ilities involved in project work, but let's start with the major three.

Fixing, 1, 2, or all 3 of these random variables does NOT make the randomness go away.

These variables are random and all variables on projects are random because of uncertainty. This uncertainty (as mentioned on many other blogs) comes from two sources. Aleatory uncertainty that is the underlying natural randomness of all project activities. This is called irreducible uncertainty. It can't be reduced. Nothing you can do will reduce it. It's there and will always be there. This is a statistical process. The only way to work in the presence of irreducible uncertainty is to have margin. Cost margin, schedule margin, technical margin.

The second is epistemic uncertainty. This is uncertainty that is event based. It's there but can be handled in some ways. Those ways can include buying two of everything in case one breaks, having redundancy in other forms - a backup site for the data center, testing, prototypes, and other activities that provide a Plan B when the probability that something will go wrong becomes true and that thing that went wrong is no longer a probability but has turned into an Issue.

So Here's the Real Problem

When we hear, we don't need to estimate, I can fix time and budget, that doesn't make the randomness go away. It just sets an upper bound on what you CAN spend and when you HAVE TO BEDONE. Those uncertainties that create the randomness are still there. Then fixed time and fixed budget plans, leave open the technical randomness as well. The time and budget are still random inside the constraints set by the project.

There's no getting around this. No matter how often someone says you can. Those someones were asleep in the engineering probability and statistic class. Here's the classic engineering course we were all forced to take as physics grad-student Probability and Statistics. †

This is basic probability and statistics of project work. The probability that something will turn out unfavorable is created by epistemic uncertainty. The statistical variances of everyday life are created by aleatory uncertainty.

Ignoring these uncertainties means it's going to turn out bad for those paying for your work

You need margin to protect from irreducible uncertainty. You need specific actions to protect from reducible uncertainty. So you can in fact fix the cost and schedule IF AND ONLY IF (IFF) you have margin and risk buy down plans. When someone says we've fixed the duration and the budget. two things come to mind.

Budget is NOT cost

budget is not real dollars

Budget is a target for the cost. The cost is a random variable

Your spend rate - real dollars - is a random variable.

No cost margin, you're going to be over budget before you start, IF you plan on meeting the customer's needs.

You can no meet the customer's need and just stop when you run out of money.

This is basic Managerial Finance

Duration is cost, since time is money where I live

A fixed duration is a target completion time

Both uncertainties - aleatory and epistemic are present in the duration cost random variable

A third notion is the killer notion

When you fix time and cost, have sufficient risk buy down activities to reduce the epistemic uncertainty that creates the probability of something going wrong to an acceptable level, and have sufficient margin to cover the expected overruns in duration, you still have the technical reducible and irreducible uncertainties that the things you building won't work, won't be what the customer wants, will cause other issues - these are called externalities in the economics of software development, and other unknowns, possibly unknowable at the beginning of the project.

Add to that all the ...illitiesthat are involved in the development of software system and all the ...illitiesinvolved in the system that the software enables to work properly.

When you fix time and or budget, and don't have protections for reducible and irreducible, you're going to be late and over budget and you have willfully ignored those outcomes. Oh and by the way, there is a probability your little gadget is not likely to meet the needs of those paying you either.

This condition is very common in our domain, as exhibited by all the programs that have overrun, gone OTB (Over Target Baseline), or had a Nunn McCurdy breach

These immutable condition (aleatory and epistemic uncertainty) are completely ignored in agile development. Agile provides rapid feed back to the risk management processes of software development. But agile is NOT a risk management process in and of itself. That's a topic for another time.

If you think you have no uncertainties - reducible or irreducible, and have fixed the budget and duration and maybe even the outcomes. You're likely on a de minimis project. Good luck with that.

† We had to take a few courses outside our major, and this was another. Classical Electrodynamics. This was an engineering course. We had a foundation of electrodynamics from the physics point of view. In that view everything can be solved through Maxwell's equations. A simple set of partial differential equation describing how electromagnetism works. When asked to give a talk on antenna theory in the engineering course, a friend (I was too afraid at that time) went to the chalk (yes no white boards) one wrote done maxwell's equations for the reciprocity theorem of antennas in free space. The Professor at the back of the told him (Steve) to sit down. We're engineers not physicists we want to know HOW things work not WHY things work

So what the #NoEstimates advocates fail to understand is that in the presence of reducible (epistemic) and irreducible (aleatory) uncertainty, actions must be taken to address these uncertainty and prevent these from unfavorably impact the project outcomes, as suggested in ...

Irreducible uncertainty can be addressed with Margin. Cost margin, schedule margin, technical performance margin.

Reducible uncertainty can be addressed with redundancy, risk retirement activities, that buy down the risk resulting from the uncertainty to an acceptable level.

In both cases managing in the presence of uncertainty means following Tim Lister's advice...

Risk Management is how Adults Manage projects

So when Hofstadter's Law is used without addressing the reducible and irreducible uncertainties and the resulting risk to project success, the result is Hofstadter's Law.

A self reference circular logic leading directly to project disappointment.

Phillip Armour has a classic article in CACM titled "Ten Unmyths of Project Estimation," Communications of the ACM (CACM), November 2002, Vol 45, No 11. Several of these Unmyths are applicable to the current #NoEstimates concept. Much of the misinformation about how estimating is the smell of dysfunction can be traced to these unmyths.

Mythology is not a lie ... it is metaphorical. It has been well said that mythology is the penultimate truth - Joseph Campbell, The Power of Myth

Using Campbell's quote, myths are not untrue. They are an essential truth, but wrapped in anecdotes that are not literally true. In our software development domain a myth is a truth that seems to be untrue. This is Armour's origin of the unmyth.

The unmyth is something that seems to be true but is actually false.

Let's look at the three core conjectures of the #Noestimates paradigm:

Estimates cannot be accurate - we cannot get an accurate estimate of cost, schedule, or probability that the result will work.

We can't say when we'll be done or how much it will cost.

All estimates are commitments - making estimates makes us committed to the number that results from the estimate.

The Accuracy Myth

Estimates are not numeric values. they are probability distributions. If the Probability Distribution below represents the probability of the duration of a project, there is a finite minim - some time where the project cannot be completed in less time.

There is the highest probability, or the Most Likely duration for the project. This is the Mode of the distribution. There is a mid point in the distribution, the Median. This is the value between the highest and the lowest possible completion times. Then there is the Mean of the distribution. This is the average of all the possible completion times. And of course The Flaw of Averages is in effect for any decisions being made on this average value †

“It is moronic to predict without first establishing an error rate for a prediction and keeping track of one’s past record of accuracy” — Nassim Nicholas Taleb, Fooled By Randomness

If we want to answer the question What is the probability of completing ON OR BEFORE a specific date, we can look at the Cumulative Distribution Function (CDF) of the Probability Distribution Function (PDF). In the chart below the PDF has the earliest finish in mid-September 2014 and the latest finish early November 2014.

The 50% probability is 23 September 2014. In most of our work, we seek an 80% confidence level of completing ON OR BEFORE the need date.

The project then MUST have schedule, cost, and technical margin to protect that probabilistic date.

How much margin is another topic.

But projects without margin are late, over budget, and likely don't work on day one. Can't be complaining about poor project performance if you don't have margin, risk management, and a plan for managing both as well as the technical processes.

So where do these charts come from? They come from a simulation of the work. The order and dependencies of the work. And the underlying statistical nature of the work elements.

No individual work element is deterministic.

Each work element has some type of dependency on the previous work element and the following work element.

Even if all the work elements are Independent and sitting in a Kanban queue, unless we have unlimited servers of that queue, being late on the current piece of work will delay the following work.

So what we need is not Accurate estimates, we need Useful estimates. The usefulness of the estimate is the degree to which it helps make optimal business decisions. The process of estimating is Buying Information. The Value of the estimates, like all value is determined by the cost to obtain that information. The value of the estimate of the opportunity cost, which is the different between the business decision made with the estimate and the business decision made without the estimate. ‡

In this book are the answers to all the questions those in the #NoEstimates camp say can't be answered.

The Accuracy Answer

All work is probabilistic.

Discover the Probability Distribution Functions for the work.

If you don't know the PDF, make one up - we use -5% + 15% for everything until we know better.

If you don't know the PDF, go look in databases of past work for your domain. Here's some databases:

http://www.nesma.org/

http://www.isbsg.org/

http://www.cosmicon.com/

If you still don't know, go find someone who does, don't guess.

With this framework - it's called Reference Class Forecasting - that is making estimate about your project from reference classes of other projects, you can start making useful estimates.

But remember, making estimates is how you make business decisions with opportunity costs. Those opportunity costs are the basis of Microeconomics and Managerial Finance.

Cone of Uncertainty and Accuracy of Estimating

There is a popular myth that the Cone of Uncertainty prevents us from making accurate estimates. We now know we need useful estimates, but those are not prevented by in the cone of uncertainty. Here's the guidance we use on our Software Intensive Systems projects.

Finally in the estimate accuracy discussion comes the cost estimate. The chart below shows how cost is driven by the probabilistic elements of the project. Which brings us back to the fundamental principle that all project work is probabilistic. Modeling the cost, schedule, and probability of technical success is mandatory in any non-trivial project. By non-trivial I mean a de minimis project, one that if we're off by a lot it doesn't really matter to those paying.

The Commitment Unmyth

So now to the big bug a boo of #NoEstimates. Estimates are evil, because they are taken as commitments by management. They're taken as commitment by Bad Management, uninformed management., management that was asleep in the High School Probability and Statistics class, management that claims to have a Business degree, but never took the Business Statistics class.

So let's clear something up,

Commitment is how Business Works

Here's an example taken directly from ‡

Estimation is a technical activity of assembling technical information about a specific situation to create hypothetical scenarios that (we hope) support a business decision. Making a commitment based on these scenarios is a business function.

The Technical “Estimation” decisions include:

When does our flight leave?

How do we get there? Car? Bus?

What route do we take?

What time of day and traffic conditions?

How busy is the airport, how long are the lines?

What is the weather like?

Are there flight delays?

This kind of information allows us to calculate the amount of time we should allow to get there.

The Business “Commitment” and Risk decisions include:

What are the benefits in catching the flight on time?

What are the consequences of missing the plane?

What is the cost of leaving early?

These are the business consequences that determine how much risk we can afford to take.

Along with these of course is the risk associated with the uncertainty in the decisions. So estimating is also Risk Management and Risk Management is management in the presence of uncertainty. And the now familiar presentation from this blog.

Risk Management is how Adults manage projects - Tim Lister. Risk management is managing in the presence of uncertainty. All project work is probabilistic and creates uncertainty. Making decisions in the presence of uncertainty requires - mandates actually - making estimates (otherwise you're guess your pulling numbers from the rectal database). So if we're going to have an Adult conversation about managing in the presence of uncertainty, it's going to be around estimating. Making estimates. improving estimates, making estimates valuable to the decision makers.

Estimates are how business works - exploring for alternatives means willfully ignoring the needs of business. Proceed at your own risk

† This average notion is common in the No estimates community. Take all the past stories or story points and find the average value and use that for the future values. That is a serious error in statistical thinking, since without the variance being acceptable, that average can be wildly off form the actual future outcomes of the project

‡ Unmythology and the Science of Estimation, Corvus International, Inc., Chicago Software Process Improvement Network, C-Spin, October 23, 2013

When confronted with making decisions on software projects in the presence of uncertainty, we can turn to an established and well tested set of principles found in Software Engineering Economics.

Software Engineering Economics is concerned with making decisions within the business context to align technical decisions with the business goals of an organization. Topics covered include fundamentals of software engineering economics (proposals, cash flow, the time-value of money, planning horizons, inflation, depreciation, replacement and retirement decisions); not for-profit decision-making (cost-benefit analysis, optimization analysis); estimation, economic risk and uncertainty (estimation techniques, decisions under risk and uncertainty); and multiple attribute decision making (value and measurement scales, compensatory and non-compensatory techniques).

Engineering Economics is one of the Knowledge Areas for educational requirements in Software Engineering defined by INCOSE, along with Computing Foundations, Mathematical Foundations, and Engineering Foundations.

A critical success factor for all software development is to model the system under development as holistic, value-providing entities have been gaining recognition as a central process of systems engineering. The use of modeling and simulation during the early stages of the system design of complex systems and architectures can:

Document system needed capabilities, functions and requirements,

Assess the mission performance,

Estimate costs, schedule, and needed product performance capabilities

Evaluate tradeoffs,

Provide insights to improve performance, reduce risk, and manage costs.

The process above can be performed in any lifecycle duration. From formal top down INCOSE VEE to Agile software development. The process rhythm is independent of the principles.

This is a critical communication factor - separation of Principles, Practices, and Processes, establishes the basis of comparing these Principles, Practices, and Processes across a broad spectrum of domains, governance models, methods, and experiences. Without a shared set of Principles, it's hard to have a conversation.

Engineering Economics

Developing products or services with other peoples money means we need a paradigm to guide our activities. Since we are spending other peoples money, the economics of that process is guided by Engineering Economics.

Engineering economic analysis concerns techniques and methods that estimate output and evaluate the worth of products and services relative to their costs. (We can't determine the value of our efforts, without knowing the cost to produce that value) Engineering economic analysis is used to evaluate system affordability. Fundamental to this knowledge area are value and utility, classification of cost, time value of money and depreciation. These are used to perform cash flow analysis, financial decision making, replacement analysis, break-even and minimum cost analysis, accounting and cost accounting. Additionally, this area involves decision making involving risk and uncertainty and estimating economic elements. [SEBok, 2015]

The Microeconomic aspects of the decision making process is guided by the principles of making decisions regarding the allocation of limited resources. In software development we always have limited resources - time, money, staff, facilities, performance limitations of software and hardware.

If we are going to increase the probability of success for software development projects we need to understand how to manage in the presence of the uncertainty surrounding time, money, staff, facilities, performance of products and services and all the other probabilistic attributes of our work.

To make decisions in the presence of these uncertainties, we need to make estimates about the impacts of those decisions. This is an unavoidable consequence of how the decision making process works.

The opportunity cost of any decision between two or more choices means there is a cost for NOT choosing one or more of the available choices. This is the basis of microeconomics of decision making. What's the cost of NOT selecting an alternative?

So when it is conjectured we can make a decision in the presence of uncertainty without estimating the impact of that decision, it's simply NOT true.

That notion violates the principle of Microeconomics

All project work is probabilistic. There is no such thing as a deterministic estimate. OK, there is. But those estimates a wrong, dead wrong, willfully ignorant wrong. All project work is probabilistic. If you're making deterministic estimates, you've chosen to ignore the basic processes of probability and statistics.

There is an important difference between Statistics and Probability. Both are needed when making decisions in the presence of uncertainty.

All projects have uncertainty.

And there are two kinds of uncertainty on all projects. Reducible and Irreducible.

Reducible uncertainty (on the right) is described by the probability of some outcome. There is an 82% probability that we'll be complete on or before the second week in November, 2016. Irreducible uncertainty (on the left) is described by the Probability Distribution Function (PDF) for the underlying statistical processes.

In both cases estimating is required. There is no deterministic way to produce an assessment of an outcome in the presence of uncertainty without making estimates. This is simple math. In the presence of uncertainty, the project variables are random variables, not deterministic variables. If there is no uncertainty, not need to estimate, just measure.

Empiricism

When we hear that #NoEstimates is about empirical data used to forecast the future, let’s look deeper into the term and the processes of empiricism.

First, an empiricist rejects the logical necessity for scientific principles and bases processes on observations. [1]

While managing other people’s money in the production of value in exchange for that money, there are principles by which that activity is guided. For empiricist principles are not immediately evident. But principles are called principles because they are indemonstrable and cannot be deduced form other premises nor be proved by any formal procedure. They are accepted they have been observed to be true in many instances and to be false in none.

Second, with empirical data comes two critical assumptions that must be tested before that data has any value in decision making.

The variances in the sampled data is sufficiently narrow to allow sufficient confidence in forecasting the future. A ±45% variance is of little use. Next is the killer problem.

With an acceptable variance, the assumption that the future is like the past must be confirmed. If this is not the case, that acceptably sampled data with the acceptable variance is not representative of the future behavior of the project.

Understanding this basis of empiricism is critical to understanding the notion of making predictions in the presence of uncertainty about the future.

Next let’s address the issue of what is an estimate. It seems obvious to all working in engineering, science, and financial domain that an estimate is a numeric value or range of values for some measure that may occur at sometime in the future.Making up definitions for estimate or selecting definition outside of engineering, science, and finance is disingenuous. There is no need to redefine anything.

Estimation consists of finding appropriate values (the estimate) for the parameters of the system of concern in such a way that some criterion is optimized. [2]

The estimate has several elements:

The quantity for the estimate – a numeric value we seek to learn about.

The range of possible values for that quantity

For estimates that have a range of values, the probability distribution of the values in the range of values. The Probability distribution function for the estimated values. The range of values is described by the PDF, with a Most Likely, Median, Mode, and other cummulants – that is what’s the variance of the variance?

For an estimates that has a probability of occurrence, the single numeric value for that probability and the confidence on that value. There is an 80% confidence of completing the project on or before the second week in November, 2005

Now when those wanting to redefine what an estimate is to support their quest to have No Estimates, like redefining forecasting as Not Estimating, it becomes clear they are not using any terms found in engineering, science, mathematics, or finance. When they suggest there are many definitions of an estimate and don’t provide any definition, with the appropriate references to that definition, it’s the same approach as saying we’re exploring for better ways to …. It’s a simple and simple minded approach to a well established discipline and making decisions and fundamentally disingenuous. And should not be tolerated.

The purpose of a cost estimate is determined by its intended use, and its intended use determines its scope and detail.

Cost estimates have two general purposes:

To help managers evaluate affordability and performance against plans, as well as the selection of alternative systems and solutions,

To support the budget process by providing estimates of the funding required to efficiently execute a program.

The notion of defining the budget leaves open the other two random variables of all project work – productivity and performance of the produced product or service.

So suggesting that estimating is no needed when the budget of provided, ignores these two are variables.

Specific applications for estimates include providing data for trade studies, independent reviews, and baseline changes. Regardless of why the cost estimate is being developed, it is important that the project’s purpose link to the missions, goals, and strategic objectives and connect the statistical and probabilistic aspects of the project to the assessment of progress to plan and the production of value in exchange for the cost to produce that value.

The Need to Estimate

The picture below, with apologies for Scott Adams, is typical of the No Estimates advocates who contend estimates are evil and need to be stopped. Estimates can’t be done.Not estimating results in a ten-fold increase in project productivity or some vague unit of measure.

[1] Dictionary of Scientific Biography, ed. Charles Coulston Gillespie, Scribner, 1073, Volume 2, pp. 604-5

[2] Forecasting Methods and Applications, Third Edition, Spyros Makridakis, Steven C. Whellwright, and Rob J. Hayndman

Some More Background

Introduction to Probability Models, 4th Edition, Sheldon M. Ross

Random Data: Analysis and Measurement Procedures, Julius S. Bendat and Allan G. Piersol

Advanced Theory of Statistics, Volume 1: Distribution Theory, Sir Maurice Kendall and Alan Stuart

Estimating Software Intensive Systems: Projects, Productsm and Processes, Richard D. Stutzke

Probability Methods for Cost Uncertainty Analysis: A Systems Engineering Perspective, Paul R. Garvey

Software Metrics: A Rigorous and Practical Approach, Third Edition, Norman Fenton and James Bieman

The development of software in the presence of uncertainty is a well developed discipline, a well developed academic topic, and a well developed practice with numerous tools, database, and models in many different SW domains.

Economics is the study of how resources (people, time, facilities) are used to produce and distribute commodities and how services are provided in society. Engineering economics is a branch of microeconomics dealing with engineering related economic decisions. Software Engineering Foundations: A Software Science Perspective, Yingxu Wang, Auerbach Publications.

Software engineering economics is a topic that addresses the elements of software project costs estimation and analysis and project benefit-cost ratio analysis. As well these costs, and the benefits from expending those costs, produce tangible and many times intangible value. The time phased aspects of developing software for money, means we need to understand the scheduling aspects of producing this value.

All three variables in the paradigm of software development for money - time, cost, and value - are random variables. This randomness comes from the underlying uncertainties in the processes found in the development of the software. These uncertainties are always there, they never go away, they are immutable.

Economic Foundations of Software Engineering

There are fundamental principles and methodologies utilized in engineering economics and their applications in software engineering that form the basis of decision making gin the presence of uncertainty. These formal economic models include the cost of production, and market models based on fundamental principles of microeconomics. The dynamic values of money and assets, patterns of cash flows, can be modeled in support of managements need to make decisions in the presence the constant uncertainties associated with software development

Economic analysis methodologies for engineering decisions include project costs, benefit-cost ratio, payback period, and rate of return can be rigorously described. This is the basis of any formal treatment of economic theories and principles. Software engineering economics is based on elements of software costs, software engineering project costs estimation, economic analyses of software engineering projects, and the software maintenance cost model.

Economics is classified into microeconomics and macroeconomics. Microeconomics is the study of behaviors of individual agents and markets. Macroeconomics is the study of the broad aspects of the economy, for example employment, export, and prices on a national or global scope.

A universal quantitative measure of commodities and services in economics is money.

Engineering economics is a branch of microeconomics. There are some basic axioms of microeconomics and engineering economics.

Demand versus supply. Demand is the required quantities for a product or service. It is also the demand for labor and materials needed to produce those products and services. Demand is a fundamental driving force of market systems and the predominant reason for most economic phenomena. The market response to a demand is called supply.

Supply is the required quantities for a product or service that producers are willing and able to sell at a given range of prices. This also extends to the labor and materials needed to produce the product and services to meet the demand.

Demands and supplies are the fundamental behaviors of dynamic market systems, which form the context of economics. Not enough Java programers in the area, cost for Java programmers goes up. Demand for rapid production of products, cost of skilled labor, special tools and processes goes up. COBOL programmers in 1998 to 2001 could ask nearly any price for their services. FORTRAN 77 programs here in Denver can get exorbitant rates to help maintain the Ballistic Missile Defense System when a local defense contractor was awarded the maintenance and support contract for Cobra Dane.

Opportunity Costs are those cost resulting from the loss of potential gain from the other alternatives then the one alternative chosen by the decision maker.

Every time we make a decisions involving multiple choices we are making an opportunity cost based decisions. Since most of the time these costs in in the future and are uncertainty, we need to estimate those opportunity costs as well as the probability that our choice is the right choice to produce the desired beneficial outcomes.

Here's an example from a tool we use, Palisade software's Crystal Ball. There are similar plug in for Excel (RiskAmp is affordable for the individual).

Another useful tool in the IT decision making world is Real Options. Here's a simple introduction to RO's and decision making.

In the presence of uncertainty, making decision about actions today that impact outcomes in the future requires some mechanism for determining those outcomes in the absence of perfect information. This absence of information creates risk. Decision making in the presence of uncertainty and resulting riks means

These decisions typically have one or more of the following characteristics: [1]

The Stakes — The stakes are involved in the decision, such as costs, schedule, delivered capabilities and those impacts on business success or the meeting the objectives.

Complexity — The ramifications of alternatives are difficult to understand the impact of the decision without detailed analysis.

Uncertainty — Uncertainty in key inputs creates uncertainty in the outcome of the decision alternatives and points to risks that may need to be managed.

Multiple Attributes — Larger numbers of attributes cause a larger need for formal analysis.

Diversity of Stakeholders — Attention is warranted to clarify objectives and formulate performance measures when the set of stakeholders reflects a diversity of values, preferences, and perspectives.

Reducible and Irreducible Uncertainty

All project work is probabilistic driven by underlying statistical processes that create uncertainty. [2] There are two types of uncertainty on all projects. Reducible (Epistemic) and Irreducible (Aleatory).

Aleatory uncertainty arises from the random variability related to natural processes on the project - the statistical processes. Work durations, productivity, variance in quality. Epistemic uncertainty arises from the incomplete or imprecise nature of available information - the probabilistic assessment of when an event may occur.

There is pervasive confusion between these two types of uncertainties when discussing the impacts on these uncertainties on project outcomes, including the estimates of cost, schedule, and technical performance.

All The World's a NonLinear, Non-Stationary Stochastic Process, Described by 2nd Order non-Linear Differential Equations.

In the presence of these conditions - and software development is - we need to understand several things for success. What are the coupled dynamics? What are the probabilistic and statistical processes that drive these dynamics? And how can we make decision in their presence?

Predictive Analytics of Project Behaviors

In the presence of uncertainty, the need to predict future outcomes is critically important. One of the professional societies I belong to has a presentation o this topic. Here's a small sample of a mature process for estimating future outcomes given past performance. If you backup the URL to http://www.iceaaonline.com/ready/wp-content/uploads/2015/06/ You'll see all the briefings on the topic of cost, schedule, and performance management used in the domains I work.

[1] Risk Informed Decision Making Handbook, NASA/SP-2010-576 Version 1.0 April 2010.

[2] "Risk-informed decision-making in the presence of epistemic uncertainty," Didier Dubois, Dominique Guyonnet, International Journal of General Systems, Taylor & Francis, 2011, 40 (2), pp.145-167.

Probability and statistics are a core business process for decision making in the presence uncertainty. Uncertainty comes in two types - Irreducible and Reducible.

Making decisions in the presence of these two types of uncertainty requires making estimates about outcomes in the presence of the risks created by the uncertainties.

All decisions involve uncertainty, risk, and trade-offs. This is an immutable principle of all business and technical processes in the presence of uncertainty.

Successful management of software project cost within the limited budget is an important concern in any business. Lack of information and reliable tools that support estimating process make it difficult to initiate estimating report during early project planning stages. To control the cost to an acceptable level, requires appropriate and accurate measurement of various project related variables and the understanding of the magnitude of their effects.

The importance of early estimating to those funding the project or those providing capital to fund products cannot be over emphasized.

Making cost estimates with Bayesian decision processes is a well developed discipline. Here's a recent paper from a colleague in NASA, Christian Smart

The risk created by these uncertainties are always present. If unaddressed, our project is at risk of failure. To performance this Bayesian analysis of program performance probabilities, there are several tools. Here's one example

In the end risk management is about estimating the impacts of reducible and irreducible uncertainty. As Tim Lister says - Risk Management is How Adults Manage Projects

In software development, we almost always encounter situations where a decision must be made when we are uncertain what the outcome might or even the uncertainty in data used to make that decision.

Decision making in the presence of uncertainty is standard management practice in all business and technical domains. From business investment decision, to technical choices for project work.

Making decisions in the presence of uncertainty means making probabilistic inferences from the information available to the decision maker.

There are many techniques for decision making. Decision trees are common. Where the probability of an outcome of a decision is part of a branch of a tree. If I go left in the branch - the decision - what happens? If I go right what happens? Each branch point becomes the decision. Each of the two or more branches becomes the outcomes. The probabilistic aspect is applied to the branches, and the outcomes - which may be probabilistic as well and are assessed for befits to those making the decision.

Another approach is Monte Carlo Simulation of decision trees. Here's a tool we use for many decisions in our domain, Palisade, Crystal Ball. There are others. They work like the manual process in the first picture, but let you tune the probabilistic branching, probabilistic outcomes to model complex decision making processes.

In the project management paradigm of projects we work, there are networks of activities. Each of these activities has some dependency or prior work, and each activity produces dependencies on follow on work. These can be model with Monte Carlo Simulation as well.

The Schedule Risk Analysis (SRA) of the network of work activities is mandated on a monthly basis in many of the programs we work.

Each of these approaches and others are designed to provide actionable information to the decision makers. This information requires a minimum understanding of what is happening to the system being managed:

What are the naturally occurring variances of the work activities that we have no control over - aleatory uncertainty?

What are the event based probabilities of some occurrence - epistemic uncertainty?

What are the consequences of each outcome - decision, probabilistic event, or naturally occurring variance - on the desired behavior of the system?

What choices can be made that will influence these outcomes?

In many cases, the information available to make these choices is in the future. Some is in the past. But that information in the past needs careful assessment.

Past data is Only useful if you can be assured the future is like the past. If not, making decision using past data without adjusting that data for the possible changes in the future takes you straight into the ditch - see The Flaw of Averages.

In order to have any credible assessment of the impact of a decision using data in the future - where will the system be going in the future? - it is mandatory to ESTIMATE.

It is simply not possible to make decisions about future outcomes in the presence of uncertainty in that future without making estimates.

Anyone says you can is incorrect. And if they insist it can be done, ask for testable evidence of their conjecture, based on the mathematics of probabilistic systems. No testable credible testable data, then it's pure speculation. Move on.

The False Conjecture of Deciding in Presence of Uncertainty without Estimates

Slicing the work into similar sized chunks, performing work on those chunks and using that information to produce information about the future makes the huge assumption the future is like the past.

Record past performance, making nice plots, running static analysis for mean, mode, standard deviation, variance is naive at best. The time series variances are rolled up hiding the latent variances that will emerge in the future. Time series analysis (ARIMA) is required to reveal the possible values in the dataset from the past that will emerge in the future, since the system under observation remains the same.

Time series analysis is a fundamental tool for making forecasting of future outcomes from past data. Weather forecasting - plus complex compressible fluid flow models - is based on time series analysis. Stock market forecasting uses time series analysis. Cost and Schedule modeling uses time series analysis. Adaptive process control algorithms, like the speed control and fuel management in your modern car uses time series analysis.

One of the originators of time series analysis, George E. P. Box and his seminal book Time Series Analysis, Forecasting and Control, is often seriously misquoted, when he said All Models are Wrong, Some are Useful. Anyone misusing that quote to try and convince you, you can't model the future didn't (or can't) do the math in Box's book and likely got a D in the High School probability and statistics class.

So do the math, read the proper books, gather past data, model the future with dependency networks, Kanban and Scrum backlogs, measure current production, forecast future production based on Monte Carlo Models - and don't believe for a moment that you can make decision about future outcomes in the presence of uncertainties without estimating that future.

In the world of project management and the process improvement efforts needed to increase the Probability of Project Success anecdotes appear to prevail when it comes to suggesting alternatives to observed dysfunction.

If we were to pile all the statistics for all the data for the effectiveness or not effectiveness of all the process improvement methods on top of each other they would lack the persuasive power of a single anecdote in most software development domains outside of Software Intensive Systems.

Why? because most people working in small groups, agile, development projects, compared to Enterprise, Mission Critical can't fail, that must show up on time, on budget, with not just the minimum viable products, the the mandatorily needed viable capability - rely on anecdotes to communicate their messages.

I say this not from just personal experience, but from research for government agencies and commercial enterprise firms tasked with Root Cause Analysis, conference proceedings, refereed journal papers, and guidance from those tasked with the corrective actions of major program failures.

Anecdotes appeal to emotion. Statistics, numbers, verifiable facts appeal to reason. It's not a fair fight. Emotiona always wins without acknowledging that emotion is seriously flawed when making decisions.

Anecdotal evidence is evidence where small numbers of anecdotes are presented. There is a large chance - statistically - this evidence is unreliable due to cherry picking or self selection (this is the core issue with the Standish Reports or anyone claiming anything without proper statistical sampling processes).

Anecdotal evidence is considered dubious support of any generalized claim. Anecdotal evidence is no more than a type description (i.e., short narrative), and is often confused in discussions with its weight, or other considerations, as to the purpose(s) for which it is used.

We've all heard stories, ½ of all IT projects fail. Waterfall is evil, hell even estimates are evil stop doing them cold turkey. They prove the point the speaker is making right? Actually they don't. I just used an anecdote to prove a point.

If I said The Garfunkel Institute just released a study showing 68% of all software development projects did not succeed because of a requirements gathering process failed to define what capabilities were needed when done, I've have made a fact base point. And you'd become bored reading the 86 pages of statistical analysis and correlation charts between all the causal factors contributing to the success or failure of the sample space of projects. See you are bored.

Instead if I said every project I've worked on went over budget and was behind schedule because we were very poor at making estimates. That'd be more appealing to your emotions, since it is a message you can relate to personally - having likely experienced many of the same failures.

The purveyors of anecdotal evidence to support a position make use of a common approach. Willfully ignoring a fact based methodology through a simple tactic...

We all know what Mark Twain said about lies, dammed lies, and statistics

People can certainly lie with statistics, done all the time. Start with How to Lie With Statistics But those types of Lies are nothing compared to the able to script personal anecdotes to support a message. From I never seen that work, to what now you're telling me - the person that actually invented this earth shattering way of writing software - that it doesn't work outside my personal sphere of experience?

An anecdote is a statistic with a sample size of one. OK, maybe a sample size of a small group of your closest friends and fellow travelers.

We fall for this all the time. It's easier to accept an anecdote describing a problem and possible solution from someone we have shared experiences with, than to investigate the literature, do the math, even do the homework needed to determine the principles, practices, and processes needed for corrective action.

Don’t fall for manipulative opinion-shapers who use story-telling as a substitute for facts. When we're trying to persuade, use facts, use actual example based on those facts. Use data that can be tested outside personal anecdotes used to support an unsubstantiated claim without suggesting both the rot cause and the testable corrective actions.

How To Lie With Statisticsis a critically important book to have on your desk if you're involved any decision making. My edition is a First Edition, but I don't have the dust jacket, so not worth that much beyond the current versions.

The reason for this post is to lay the ground work for assessing reports, presentations, webinars, and other selling documents that contain statistical information.

The classic statistical misuse if the Standish Report, describing the success and failure of IT projects.

Here's my summation on the elements of How To Lie in our project domain

Sample with the Built In Bias - the population of the sample space is not defined. The samples are self selected in that those who respond are the basis of the statistics. No adjustment for all those who did not respond to a survey for example.

The Well Chosen Average - The arithmetical average, Median, and Mode are estimators of the population statistics. Any of these without a variance is of little value for decision making.

Little Figures That Are Not There - the classic is use this approach (in this case #NoEstimates) and your productivity will improve 10X, that 1000% by the way. A 1000% improvement. That's unbelievable, literally unbelievable. The actual improvements are stated, only the percentage. The baseline performance is not stated. It's unbelievable.

Much Ado About Practically Nothing - the probability of being in the range of normal. This is the basis of advertising. What's the variance?

Gee-Whiz Graphs - using graphics and adjustable scales provides the opportunity to manipulate the message. The classic example of this is the estimating errors in a popular graph used by the No Estimates advocates. It's a graph showing the number of projects that complete over there estimated cost and schedule. What's not shown is the credibility of the original estimate.

One Dimensional Picture - using a picture to show numbers, where the picture is not in the scale as the numbers provides a messaging path for visual readers.

Semi-attached Picture - If you can't prove what you want to prove, demonstrate something else and pretend that they are the same thing. In one example, the logic is inverted. Estimating is conjectured to be the root cause of problems. With no evidence of that, the statement we don't see how estimating can produce success, so not estimating will increase the probability of success.

Post Hoc Rides Again - posy hoc causality is common in the absence of a cause and effect understanding. The correlation and causality differences are many times not understood.

Here's a nice example of How To Lie

There's a chart from an IEEE Computer article showing the numbers of projects that exceeded their estimated cost. But let's start with some research on the problem. Coping with the Cone of Uncertainty.

There is a graph, popularly used to show that estimates

This diagram is actually MISUSED by the #NoEstimates advocates.

The presentation below shows the follow on information for how estimates can be improved the increase the confidence in the process and improvements in the business. As well shows the root causes of poor estimates and their corrective actions. Please ignore any ruse of Todd's chart without the full presentation.

My mistake was doing just that.

So before anyone accepts any conjecture from a #NoEstimates advocate using the graph above, please read the briefing at the link below to see the corrective actions for making poor estimates.

In a recent post of forecasting capacity planning a time series of data was used as the basis of the discussion.

Some static statistics were then presented.

With a discussion of the upper and lower ranges of the past data. The REAL question though is what is the likely outcomes for data in the future given the past performance data. That is if we recorded what happened in the past, what is the likely data in the future?

The average and upper and lower ranges from the past data are static statistics. That is all the dynamic behavior of the past is wiped out in the averaging and deviation processes, so that information can no longer be used to forecast the possible outcomes of the future.

This is one of the attributes of The Flaw of Averages and How to Lie With Statistics, two books that should be on every managers desk. That is managers tasked with making decisions in the presence of uncertainty when spending other peoples money.

We now have a Time Series and can ask the question what is the range of possible outcomes in the future given the values in the past? This can easily be done with a free tool - R. R is a statistical programming language that is free from the Comprehensive R Archive Network (CRAN). In R, there are several functions that can be used to make these forecasts. That is what are the estimated values in the future form the past and their confidence intervals.

Let's start with some simple steps:

Record all the data in the past. For example make a text file of the values in the first chart. Name that file NE.Numbers

Start the R tool. Better yet download an IDE for R. RStudio is one. That way there is a development environment for your statistical work. As well there are many Free R books on statistical forecasting - estimating outcomes in the future.

OK, read the Time Series of raw data from the file of value as assign it to a Variable

NETS=ts(NE.Numbers)

The ts function converts the Time Series into an object - a Time Series - that can be used by the next function

With the Time Series now in the right format, apply the ARIMA function. ARIMA is Autoregressive Integrated Moving Average. Also know as the Box-Jenkins algorithm. The is George Box of the famously misused and blatantly abused quote all models are wrong some models are useful. If you don't have the full paper where that quote came from and the book Time Series Analysis: Forecasting and Control, Box and Jenkins, please resist re-quoting out of context. That quoyte has become the meme for those not having the background to do the math for time series analysis and it becomes a mantra for willfully ignoring the math needed to actually make estimates of the future - forecasting - using time series of the past in ANY domain. ARIMA is the beginning basis of all statistical forecasting, the science, engineering, and finance.

The ARIMA algorithm has three parameters - p, d, q

With the original data turned into a Time Series and presented to the ARIMA function we can now apply the Forecast function. This function provides methods and tools for displaying and analyzing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

When applied to the ARIMA output we get a Forecast series that can be plotted.

Here's what all this looks like in RStudio:

NETS=ts(NE.Numbers) - convert the raw numbers to a time series NETSARIMA=arima(NETS, c=order(0,1,1)) - make an ARIMA object NEFORECAST = forecast(NETSARIMA) - make a forecast using that plot(NEFORECAST) - plot it

Here's the plot, with the time series from the raw data and the 80% and 90% confidence bands on the possible outcomes in the future.

The Punch Line

You want to make decisions with other peoples money when the 80% confidence in a possible outcome is itself a - 56% to +68% variance? really. Flipping coins gets a better probability of an outcome inside all the possible outcomes that happened in the past. The time series is essentially a random series with very low confidence of being anywhere near the mean. This is the basis of The Flaw of Averages.

Where I work this would be a non-starter if we came to the Program Manager with this forecast of the Estimate to Complete based on an Average with that wide a variance.

Possible where there is low value at risk, a customer that has little concern for cost and schedule overrun, and maybe where the work is actually and experiment with no deadline or not-to-exceed budget, or any other real constraint. But if your project has a need date for the produced capabilities, a date when those capabilities need to start earning their keep and need to start producing value that can be booked on the balance sheet a much higher confidence in what the future NEEDS to be is likely going to be the key to success

The Primary Reason forEstimates

First estimates are for the business. Yes developers can use them too. But the business has a business goal. Make money at some point in the future on the sunk costs of today - the breakeven date. These sunk costs are recoverable - hopefully - so we need to know when we'll be even with our investment. This is how business works, they make decisions in the presence of uncertainty - not on the opinion of development saying we recorded our past performance on an average for projected that to the future. No, they need a risk adjusted, statistically sound level of confidence that they won't run out money before breakeven. What this means in practice is a management reserve and cost and schedule margin to protect the project from those naturally occurring variances and those probabilistic events to derail all the best laid plans.

Now developers make not think like this. But someone somewhere in a non-trivial business does. Usually in the Office of the CFO. This is called Managerial Finance and it's how serious money at risk firms manage.

So when you see time series like those in the original post, do your homework and show the confidence of the probability of the needed performance actually showing up. And by needed performance I mean the steering target used in the Closed Loop Control system used to increase the probability that the planned value - that the Agilest so dearly treasure - actually appears somewhere near the planned need date and somewhere around the planned cost so the Return on Investment those paying for your work are not disappointed with a negative return and label their spend as underwater.

So What Does This Mean in the End?

Even when you're using past performance - one of the better ways of forecasting the future - you need to give careful consideration of those past numbers. Averages and simple variances which wipe out the actual underlying time series variances - are not only naive, they are bad statistics used to make bad management decisions.

Add to the poorly formed notion that decisions can be made about future outcomes in the presence of uncertainty in the absence of estimates about that future and you've got the makings of management disappointment. The discipline of estimating future outcomes from past behaviors is well developed. The mathematics and especially the terms used in that mathematics are well established. Here's some source we use in our everyday work. These are not populist books, they are math and engineering. They have equations, algorithm, code examples. They are used used the value at risk is sufficiently high that management is on the hook for meeting the performance goals in exchange for the money assigned to the project.

If you work a project that doesn't care too much about deadlines, budget overages, or what gets produced other than the minimal products, then these books and related papers are probably not for you. And most likely Not Estimating the probability that you'll not over spend, show up seriously late, and fail to produce the needed capabilities to meet the Business Plans, will be just fine. But if you are expected to meet the business goals in exchange for the spend plan you've beed assigned, these might be a good place start to avoid being a statistic (dead skunk on the middle of the road) in the next Chaos Report (no matter how poorly the statistics are).

This by the way is an understanding I came to on the plane flight home this week. #Noestimates is a credible way to run your project when these conditions are in place. Otherwise you may what to read how to make credible forecasts of what the cost and schedule is going to be for the value produced with your customer's money, assuming they actually care about not wasting it.

I hear all the time estimating is the same as guessing. This is not true mathematically nor is not true business process wise. This is an approach used by many (guessing), not understanding that making decisions in the presence of uncertainty requires we understand the impact of that decision. When that future is uncertain, we need to know that impact in probabilistic terms. And with this, comes confidence, precision, and accuracy of the estimate.

What’s the difference between estimate and guess? The distinction between the two words is one of the degree of care taken in arriving at a conclusion.

The word Estimate is derived from the Latin word aestimare, meaning to value. The term is has the origin of estimable, which means capable of being estimated or worthy of esteem, and of course esteem, which means regard as in High Regard.

To estimate means to judge the extent, nature, or value of something - connected to the regard - he is held in high regard, with the implication that the result is based on expertise or familiarity. An estimate is the resulting calculation or judgment. A related term is approximation, meaning close or near.

In between a guess and an estimate is an educated guess, a more casual estimate. An idiomatic term for this type of middle-ground conclusion is ballpark figure. The origin of this American English idiom, which alludes to a baseball stadium, is not certain, but one conclusion is that it is related to in the ballpark, meaning close in the sense that one at such a location may not be in a precise location but is in the stadium.

To guess is to believe or suppose, to form an opinion based on little or no evidence, or to be correct by chance or conjecture. A guess is a thought or idea arrived at by one of these methods. Synonyms for guess include conjecture and surmise, which like guess can be employed both as verbs and as nouns.

We could have a hunch or an intuition, or we can engage in guesswork or speculation. Dead reckoning is same thing as guesswork. Dead reckoning was originally referred to a navigation process based on reliable information. Near synonyms describing thoughts or ideas developed with more rigor include hypothesis and supposition, as well as theory and thesis.

A guess is a casual, perhaps spontaneous conclusion. An estimate is based on intentional thought processes supported by data.

What Does This Mean For Projects?

If we're guessing we're making uninformed conclusions usually in the absence of data, experience, or any evidence of credibility. If we're estimating we are making informed conclusions based on data, past performance, models - including Monte Carlo models, and parametric models.

When we hear decisions can be made without estimates. Or all estimating is guessing, we now mathematically and business process - neither of this is true.

## Recent Comments