So what the #NoEstimates advocates fail to understand is that in the presence of reducible (epistemic) and irreducible (aleatory) uncertainty, actions must be taken to address these uncertainty and prevent these from unfavorably impact the project outcomes, as suggested in ...

Irreducible uncertainty can be addressed with Margin. Cost margin, schedule margin, technical performance margin.

Reducible uncertainty can be addressed with redundancy, risk retirement activities, that buy down the risk resulting from the uncertainty to an acceptable level.

In both cases managing in the presence of uncertainty means following Tim Lister's advice...

Risk Management is how Adults Manage projects

So when Hofstadter's Law is used without addressing the reducible and irreducible uncertainties and the resulting risk to project success, the result is Hofstadter's Law.

A self reference circular logic leading directly to project disappointment.

Phillip Armour has a classic article in CACM titled "Ten Unmyths of Project Estimation," Communications of the ACM (CACM), November 2002, Vol 45, No 11. Several of these Unmyths are applicable to the current #NoEstimates concept. Much of the misinformation about how estimating is the smell of dysfunction can be traced to these unmyths.

Mythology is not a lie ... it is metaphorical. It has been well said that mythology is the penultimate truth - Joseph Campbell, The Power of Myth

Using Campbell's quote, myths are not untrue. They are an essential truth, but wrapped in anecdotes that are not literally true. In our software development domain a myth is a truth that seems to be untrue. This is Armour's origin of the unmyth.

The unmyth is something that seems to be true but is actually false.

Let's look at the three core conjectures of the #Noestimates paradigm:

Estimates cannot be accurate - we cannot get an accurate estimate of cost, schedule, or probability that the result will work.

We can't say when we'll be done or how much it will cost.

All estimates are commitments - making estimates makes us committed to the number that results from the estimate.

The Accuracy Myth

Estimates are not numeric values. they are probability distributions. If the Probability Distribution below represents the probability of the duration of a project, there is a finite minim - some time where the project cannot be completed in less time.

There is the highest probability, or the Most Likely duration for the project. This is the Mode of the distribution. There is a mid point in the distribution, the Median. This is the value between the highest and the lowest possible completion times. Then there is the Mean of the distribution. This is the average of all the possible completion times. And of course The Flaw of Averages is in effect for any decisions being made on this average value †

“It is moronic to predict without first establishing an error rate for a prediction and keeping track of one’s past record of accuracy” — Nassim Nicholas Taleb, Fooled By Randomness

If we want to answer the question What is the probability of completing ON OR BEFORE a specific date, we can look at the Cumulative Distribution Function (CDF) of the Probability Distribution Function (PDF). In the chart below the PDF has the earliest finish in mid-September 2014 and the latest finish early November 2014.

The 50% probability is 23 September 2014. In most of our work, we seek an 80% confidence level of completing ON OR BEFORE the need date.

The project then MUST have schedule, cost, and technical margin to protect that probabilistic date.

How much margin is another topic.

But projects without margin are late, over budget, and likely don't work on day one. Can't be complaining about poor project performance if you don't have margin, risk management, and a plan for managing both as well as the technical processes.

So where do these charts come from? They come from a simulation of the work. The order and dependencies of the work. And the underlying statistical nature of the work elements.

No individual work element is deterministic.

Each work element has some type of dependency on the previous work element and the following work element.

Even if all the work elements are Independent and sitting in a Kanban queue, unless we have unlimited servers of that queue, being late on the current piece of work will delay the following work.

So what we need is not Accurate estimates, we need Useful estimates. The usefulness of the estimate is the degree to which it helps make optimal business decisions. The process of estimating is Buying Information. The Value of the estimates, like all value is determined by the cost to obtain that information. The value of the estimate of the opportunity cost, which is the different between the business decision made with the estimate and the business decision made without the estimate. ‡

In this book are the answers to all the questions those in the #NoEstimates camp say can't be answered.

The Accuracy Answer

All work is probabilistic.

Discover the Probability Distribution Functions for the work.

If you don't know the PDF, make one up - we use -5% + 15% for everything until we know better.

If you don't know the PDF, go look in databases of past work for your domain. Here's some databases:

http://www.nesma.org/

http://www.isbsg.org/

http://www.cosmicon.com/

If you still don't know, go find someone who does, don't guess.

With this framework - it's called Reference Class Forecasting - that is making estimate about your project from reference classes of other projects, you can start making useful estimates.

But remember, making estimates is how you make business decisions with opportunity costs. Those opportunity costs are the basis of Microeconomics and Managerial Finance.

Cone of Uncertainty and Accuracy of Estimating

There is a popular myth that the Cone of Uncertainty prevents us from making accurate estimates. We now know we need useful estimates, but those are not prevented by in the cone of uncertainty. Here's the guidance we use on our Software Intensive Systems projects.

Finally in the estimate accuracy discussion comes the cost estimate. The chart below shows how cost is driven by the probabilistic elements of the project. Which brings us back to the fundamental principle that all project work is probabilistic. Modeling the cost, schedule, and probability of technical success is mandatory in any non-trivial project. By non-trivial I mean a de minimis project, one that if we're off by a lot it doesn't really matter to those paying.

The Commitment Unmyth

So now to the big bug a boo of #NoEstimates. Estimates are evil, because they are taken as commitments by management. They're taken as commitment by Bad Management, uninformed management., management that was asleep in the High School Probability and Statistics class, management that claims to have a Business degree, but never took the Business Statistics class.

So let's clear something up,

Commitment is how Business Works

Here's an example taken directly from ‡

Estimation is a technical activity of assembling technical information about a specific situation to create hypothetical scenarios that (we hope) support a business decision. Making a commitment based on these scenarios is a business function.

The Technical “Estimation” decisions include:

When does our flight leave?

How do we get there? Car? Bus?

What route do we take?

What time of day and traffic conditions?

How busy is the airport, how long are the lines?

What is the weather like?

Are there flight delays?

This kind of information allows us to calculate the amount of time we should allow to get there.

The Business “Commitment” and Risk decisions include:

What are the benefits in catching the flight on time?

What are the consequences of missing the plane?

What is the cost of leaving early?

These are the business consequences that determine how much risk we can afford to take.

Along with these of course is the risk associated with the uncertainty in the decisions. So estimating is also Risk Management and Risk Management is management in the presence of uncertainty. And the now familiar presentation from this blog.

Risk Management is how Adults manage projects - Tim Lister. Risk management is managing in the presence of uncertainty. All project work is probabilistic and creates uncertainty. Making decisions in the presence of uncertainty requires - mandates actually - making estimates (otherwise you're guess your pulling numbers from the rectal database). So if we're going to have an Adult conversation about managing in the presence of uncertainty, it's going to be around estimating. Making estimates. improving estimates, making estimates valuable to the decision makers.

Estimates are how business works - exploring for alternatives means willfully ignoring the needs of business. Proceed at your own risk

† This average notion is common in the No estimates community. Take all the past stories or story points and find the average value and use that for the future values. That is a serious error in statistical thinking, since without the variance being acceptable, that average can be wildly off form the actual future outcomes of the project

‡ Unmythology and the Science of Estimation, Corvus International, Inc., Chicago Software Process Improvement Network, C-Spin, October 23, 2013

When confronted with making decisions on software projects in the presence of uncertainty, we can turn to an established and well tested set of principles found in Software Engineering Economics.

Software Engineering Economics is concerned with making decisions within the business context to align technical decisions with the business goals of an organization. Topics covered include fundamentals of software engineering economics (proposals, cash flow, the time-value of money, planning horizons, inflation, depreciation, replacement and retirement decisions); not for-profit decision-making (cost-benefit analysis, optimization analysis); estimation, economic risk and uncertainty (estimation techniques, decisions under risk and uncertainty); and multiple attribute decision making (value and measurement scales, compensatory and non-compensatory techniques).

Engineering Economics is one of the Knowledge Areas for educational requirements in Software Engineering defined by INCOSE, along with Computing Foundations, Mathematical Foundations, and Engineering Foundations.

A critical success factor for all software development is to model the system under development as holistic, value-providing entities have been gaining recognition as a central process of systems engineering. The use of modeling and simulation during the early stages of the system design of complex systems and architectures can:

Document system needed capabilities, functions and requirements,

Assess the mission performance,

Estimate costs, schedule, and needed product performance capabilities

Evaluate tradeoffs,

Provide insights to improve performance, reduce risk, and manage costs.

The process above can be performed in any lifecycle duration. From formal top down INCOSE VEE to Agile software development. The process rhythm is independent of the principles.

This is a critical communication factor - separation of Principles, Practices, and Processes, establishes the basis of comparing these Principles, Practices, and Processes across a broad spectrum of domains, governance models, methods, and experiences. Without a shared set of Principles, it's hard to have a conversation.

Engineering Economics

Developing products or services with other peoples money means we need a paradigm to guide our activities. Since we are spending other peoples money, the economics of that process is guided by Engineering Economics.

Engineering economic analysis concerns techniques and methods that estimate output and evaluate the worth of products and services relative to their costs. (We can't determine the value of our efforts, without knowing the cost to produce that value) Engineering economic analysis is used to evaluate system affordability. Fundamental to this knowledge area are value and utility, classification of cost, time value of money and depreciation. These are used to perform cash flow analysis, financial decision making, replacement analysis, break-even and minimum cost analysis, accounting and cost accounting. Additionally, this area involves decision making involving risk and uncertainty and estimating economic elements. [SEBok, 2015]

The Microeconomic aspects of the decision making process is guided by the principles of making decisions regarding the allocation of limited resources. In software development we always have limited resources - time, money, staff, facilities, performance limitations of software and hardware.

If we are going to increase the probability of success for software development projects we need to understand how to manage in the presence of the uncertainty surrounding time, money, staff, facilities, performance of products and services and all the other probabilistic attributes of our work.

To make decisions in the presence of these uncertainties, we need to make estimates about the impacts of those decisions. This is an unavoidable consequence of how the decision making process works.

The opportunity cost of any decision between two or more choices means there is a cost for NOT choosing one or more of the available choices. This is the basis of microeconomics of decision making. What's the cost of NOT selecting an alternative?

So when it is conjectured we can make a decision in the presence of uncertainty without estimating the impact of that decision, it's simply NOT true.

That notion violates the principle of Microeconomics

All project work is probabilistic. There is no such thing as a deterministic estimate. OK, there is. But those estimates a wrong, dead wrong, willfully ignorant wrong. All project work is probabilistic. If you're making deterministic estimates, you've chosen to ignore the basic processes of probability and statistics.

There is an important difference between Statistics and Probability. Both are needed when making decisions in the presence of uncertainty.

All projects have uncertainty.

And there are two kinds of uncertainty on all projects. Reducible and Irreducible.

Reducible uncertainty (on the right) is described by the probability of some outcome. There is an 82% probability that we'll be complete on or before the second week in November, 2016. Irreducible uncertainty (on the left) is described by the Probability Distribution Function (PDF) for the underlying statistical processes.

In both cases estimating is required. There is no deterministic way to produce an assessment of an outcome in the presence of uncertainty without making estimates. This is simple math. In the presence of uncertainty, the project variables are random variables, not deterministic variables. If there is no uncertainty, not need to estimate, just measure.

Empiricism

When we hear that #NoEstimates is about empirical data used to forecast the future, let’s look deeper into the term and the processes of empiricism.

First, an empiricist rejects the logical necessity for scientific principles and bases processes on observations. [1]

While managing other people’s money in the production of value in exchange for that money, there are principles by which that activity is guided. For empiricist principles are not immediately evident. But principles are called principles because they are indemonstrable and cannot be deduced form other premises nor be proved by any formal procedure. They are accepted they have been observed to be true in many instances and to be false in none.

Second, with empirical data comes two critical assumptions that must be tested before that data has any value in decision making.

The variances in the sampled data is sufficiently narrow to allow sufficient confidence in forecasting the future. A ±45% variance is of little use. Next is the killer problem.

With an acceptable variance, the assumption that the future is like the past must be confirmed. If this is not the case, that acceptably sampled data with the acceptable variance is not representative of the future behavior of the project.

Understanding this basis of empiricism is critical to understanding the notion of making predictions in the presence of uncertainty about the future.

Next let’s address the issue of what is an estimate. It seems obvious to all working in engineering, science, and financial domain that an estimate is a numeric value or range of values for some measure that may occur at sometime in the future.Making up definitions for estimate or selecting definition outside of engineering, science, and finance is disingenuous. There is no need to redefine anything.

Estimation consists of finding appropriate values (the estimate) for the parameters of the system of concern in such a way that some criterion is optimized. [2]

The estimate has several elements:

The quantity for the estimate – a numeric value we seek to learn about.

The range of possible values for that quantity

For estimates that have a range of values, the probability distribution of the values in the range of values. The Probability distribution function for the estimated values. The range of values is described by the PDF, with a Most Likely, Median, Mode, and other cummulants – that is what’s the variance of the variance?

For an estimates that has a probability of occurrence, the single numeric value for that probability and the confidence on that value. There is an 80% confidence of completing the project on or before the second week in November, 2005

Now when those wanting to redefine what an estimate is to support their quest to have No Estimates, like redefining forecasting as Not Estimating, it becomes clear they are not using any terms found in engineering, science, mathematics, or finance. When they suggest there are many definitions of an estimate and don’t provide any definition, with the appropriate references to that definition, it’s the same approach as saying we’re exploring for better ways to …. It’s a simple and simple minded approach to a well established discipline and making decisions and fundamentally disingenuous. And should not be tolerated.

The purpose of a cost estimate is determined by its intended use, and its intended use determines its scope and detail.

Cost estimates have two general purposes:

To help managers evaluate affordability and performance against plans, as well as the selection of alternative systems and solutions,

To support the budget process by providing estimates of the funding required to efficiently execute a program.

The notion of defining the budget leaves open the other two random variables of all project work – productivity and performance of the produced product or service.

So suggesting that estimating is no needed when the budget of provided, ignores these two are variables.

Specific applications for estimates include providing data for trade studies, independent reviews, and baseline changes. Regardless of why the cost estimate is being developed, it is important that the project’s purpose link to the missions, goals, and strategic objectives and connect the statistical and probabilistic aspects of the project to the assessment of progress to plan and the production of value in exchange for the cost to produce that value.

The Need to Estimate

The picture below, with apologies for Scott Adams, is typical of the No Estimates advocates who contend estimates are evil and need to be stopped. Estimates can’t be done.Not estimating results in a ten-fold increase in project productivity or some vague unit of measure.

[1] Dictionary of Scientific Biography, ed. Charles Coulston Gillespie, Scribner, 1073, Volume 2, pp. 604-5

[2] Forecasting Methods and Applications, Third Edition, Spyros Makridakis, Steven C. Whellwright, and Rob J. Hayndman

Some More Background

Introduction to Probability Models, 4th Edition, Sheldon M. Ross

Random Data: Analysis and Measurement Procedures, Julius S. Bendat and Allan G. Piersol

Advanced Theory of Statistics, Volume 1: Distribution Theory, Sir Maurice Kendall and Alan Stuart

Estimating Software Intensive Systems: Projects, Productsm and Processes, Richard D. Stutzke

Probability Methods for Cost Uncertainty Analysis: A Systems Engineering Perspective, Paul R. Garvey

Software Metrics: A Rigorous and Practical Approach, Third Edition, Norman Fenton and James Bieman

The development of software in the presence of uncertainty is a well developed discipline, a well developed academic topic, and a well developed practice with numerous tools, database, and models in many different SW domains.

Economics is the study of how resources (people, time, facilities) are used to produce and distribute commodities and how services are provided in society. Engineering economics is a branch of microeconomics dealing with engineering related economic decisions. Software Engineering Foundations: A Software Science Perspective, Yingxu Wang, Auerbach Publications.

Software engineering economics is a topic that addresses the elements of software project costs estimation and analysis and project benefit-cost ratio analysis. As well these costs, and the benefits from expending those costs, produce tangible and many times intangible value. The time phased aspects of developing software for money, means we need to understand the scheduling aspects of producing this value.

All three variables in the paradigm of software development for money - time, cost, and value - are random variables. This randomness comes from the underlying uncertainties in the processes found in the development of the software. These uncertainties are always there, they never go away, they are immutable.

Economic Foundations of Software Engineering

There are fundamental principles and methodologies utilized in engineering economics and their applications in software engineering that form the basis of decision making gin the presence of uncertainty. These formal economic models include the cost of production, and market models based on fundamental principles of microeconomics. The dynamic values of money and assets, patterns of cash flows, can be modeled in support of managements need to make decisions in the presence the constant uncertainties associated with software development

Economic analysis methodologies for engineering decisions include project costs, benefit-cost ratio, payback period, and rate of return can be rigorously described. This is the basis of any formal treatment of economic theories and principles. Software engineering economics is based on elements of software costs, software engineering project costs estimation, economic analyses of software engineering projects, and the software maintenance cost model.

Economics is classified into microeconomics and macroeconomics. Microeconomics is the study of behaviors of individual agents and markets. Macroeconomics is the study of the broad aspects of the economy, for example employment, export, and prices on a national or global scope.

A universal quantitative measure of commodities and services in economics is money.

Engineering economics is a branch of microeconomics. There are some basic axioms of microeconomics and engineering economics.

Demand versus supply. Demand is the required quantities for a product or service. It is also the demand for labor and materials needed to produce those products and services. Demand is a fundamental driving force of market systems and the predominant reason for most economic phenomena. The market response to a demand is called supply.

Supply is the required quantities for a product or service that producers are willing and able to sell at a given range of prices. This also extends to the labor and materials needed to produce the product and services to meet the demand.

Demands and supplies are the fundamental behaviors of dynamic market systems, which form the context of economics. Not enough Java programers in the area, cost for Java programmers goes up. Demand for rapid production of products, cost of skilled labor, special tools and processes goes up. COBOL programmers in 1998 to 2001 could ask nearly any price for their services. FORTRAN 77 programs here in Denver can get exorbitant rates to help maintain the Ballistic Missile Defense System when a local defense contractor was awarded the maintenance and support contract for Cobra Dane.

Opportunity Costs are those cost resulting from the loss of potential gain from the other alternatives then the one alternative chosen by the decision maker.

Every time we make a decisions involving multiple choices we are making an opportunity cost based decisions. Since most of the time these costs in in the future and are uncertainty, we need to estimate those opportunity costs as well as the probability that our choice is the right choice to produce the desired beneficial outcomes.

Here's an example from a tool we use, Palisade software's Crystal Ball. There are similar plug in for Excel (RiskAmp is affordable for the individual).

Another useful tool in the IT decision making world is Real Options. Here's a simple introduction to RO's and decision making.

In the presence of uncertainty, making decision about actions today that impact outcomes in the future requires some mechanism for determining those outcomes in the absence of perfect information. This absence of information creates risk. Decision making in the presence of uncertainty and resulting riks means

These decisions typically have one or more of the following characteristics: [1]

The Stakes — The stakes are involved in the decision, such as costs, schedule, delivered capabilities and those impacts on business success or the meeting the objectives.

Complexity — The ramifications of alternatives are difficult to understand the impact of the decision without detailed analysis.

Uncertainty — Uncertainty in key inputs creates uncertainty in the outcome of the decision alternatives and points to risks that may need to be managed.

Multiple Attributes — Larger numbers of attributes cause a larger need for formal analysis.

Diversity of Stakeholders — Attention is warranted to clarify objectives and formulate performance measures when the set of stakeholders reflects a diversity of values, preferences, and perspectives.

Reducible and Irreducible Uncertainty

All project work is probabilistic driven by underlying statistical processes that create uncertainty. [2] There are two types of uncertainty on all projects. Reducible (Epistemic) and Irreducible (Aleatory).

Aleatory uncertainty arises from the random variability related to natural processes on the project - the statistical processes. Work durations, productivity, variance in quality. Epistemic uncertainty arises from the incomplete or imprecise nature of available information - the probabilistic assessment of when an event may occur.

There is pervasive confusion between these two types of uncertainties when discussing the impacts on these uncertainties on project outcomes, including the estimates of cost, schedule, and technical performance.

All The World's a NonLinear, Non-Stationary Stochastic Process, Described by 2nd Order non-Linear Differential Equations.

In the presence of these conditions - and software development is - we need to understand several things for success. What are the coupled dynamics? What are the probabilistic and statistical processes that drive these dynamics? And how can we make decision in their presence?

Predictive Analytics of Project Behaviors

In the presence of uncertainty, the need to predict future outcomes is critically important. One of the professional societies I belong to has a presentation o this topic. Here's a small sample of a mature process for estimating future outcomes given past performance. If you backup the URL to http://www.iceaaonline.com/ready/wp-content/uploads/2015/06/ You'll see all the briefings on the topic of cost, schedule, and performance management used in the domains I work.

[1] Risk Informed Decision Making Handbook, NASA/SP-2010-576 Version 1.0 April 2010.

[2] "Risk-informed decision-making in the presence of epistemic uncertainty," Didier Dubois, Dominique Guyonnet, International Journal of General Systems, Taylor & Francis, 2011, 40 (2), pp.145-167.

Probability and statistics are a core business process for decision making in the presence uncertainty. Uncertainty comes in two types - Irreducible and Reducible.

Making decisions in the presence of these two types of uncertainty requires making estimates about outcomes in the presence of the risks created by the uncertainties.

All decisions involve uncertainty, risk, and trade-offs. This is an immutable principle of all business and technical processes in the presence of uncertainty.

Successful management of software project cost within the limited budget is an important concern in any business. Lack of information and reliable tools that support estimating process make it difficult to initiate estimating report during early project planning stages. To control the cost to an acceptable level, requires appropriate and accurate measurement of various project related variables and the understanding of the magnitude of their effects.

The importance of early estimating to those funding the project or those providing capital to fund products cannot be over emphasized.

Making cost estimates with Bayesian decision processes is a well developed discipline. Here's a recent paper from a colleague in NASA, Christian Smart

The risk created by these uncertainties are always present. If unaddressed, our project is at risk of failure. To performance this Bayesian analysis of program performance probabilities, there are several tools. Here's one example

In the end risk management is about estimating the impacts of reducible and irreducible uncertainty. As Tim Lister says - Risk Management is How Adults Manage Projects

In software development, we almost always encounter situations where a decision must be made when we are uncertain what the outcome might or even the uncertainty in data used to make that decision.

Decision making in the presence of uncertainty is standard management practice in all business and technical domains. From business investment decision, to technical choices for project work.

Making decisions in the presence of uncertainty means making probabilistic inferences from the information available to the decision maker.

There are many techniques for decision making. Decision trees are common. Where the probability of an outcome of a decision is part of a branch of a tree. If I go left in the branch - the decision - what happens? If I go right what happens? Each branch point becomes the decision. Each of the two or more branches becomes the outcomes. The probabilistic aspect is applied to the branches, and the outcomes - which may be probabilistic as well and are assessed for befits to those making the decision.

Another approach is Monte Carlo Simulation of decision trees. Here's a tool we use for many decisions in our domain, Palisade, Crystal Ball. There are others. They work like the manual process in the first picture, but let you tune the probabilistic branching, probabilistic outcomes to model complex decision making processes.

In the project management paradigm of projects we work, there are networks of activities. Each of these activities has some dependency or prior work, and each activity produces dependencies on follow on work. These can be model with Monte Carlo Simulation as well.

The Schedule Risk Analysis (SRA) of the network of work activities is mandated on a monthly basis in many of the programs we work.

Each of these approaches and others are designed to provide actionable information to the decision makers. This information requires a minimum understanding of what is happening to the system being managed:

What are the naturally occurring variances of the work activities that we have no control over - aleatory uncertainty?

What are the event based probabilities of some occurrence - epistemic uncertainty?

What are the consequences of each outcome - decision, probabilistic event, or naturally occurring variance - on the desired behavior of the system?

What choices can be made that will influence these outcomes?

In many cases, the information available to make these choices is in the future. Some is in the past. But that information in the past needs careful assessment.

Past data is Only useful if you can be assured the future is like the past. If not, making decision using past data without adjusting that data for the possible changes in the future takes you straight into the ditch - see The Flaw of Averages.

In order to have any credible assessment of the impact of a decision using data in the future - where will the system be going in the future? - it is mandatory to ESTIMATE.

It is simply not possible to make decisions about future outcomes in the presence of uncertainty in that future without making estimates.

Anyone says you can is incorrect. And if they insist it can be done, ask for testable evidence of their conjecture, based on the mathematics of probabilistic systems. No testable credible testable data, then it's pure speculation. Move on.

The False Conjecture of Deciding in Presence of Uncertainty without Estimates

Slicing the work into similar sized chunks, performing work on those chunks and using that information to produce information about the future makes the huge assumption the future is like the past.

Record past performance, making nice plots, running static analysis for mean, mode, standard deviation, variance is naive at best. The time series variances are rolled up hiding the latent variances that will emerge in the future. Time series analysis (ARIMA) is required to reveal the possible values in the dataset from the past that will emerge in the future, since the system under observation remains the same.

Time series analysis is a fundamental tool for making forecasting of future outcomes from past data. Weather forecasting - plus complex compressible fluid flow models - is based on time series analysis. Stock market forecasting uses time series analysis. Cost and Schedule modeling uses time series analysis. Adaptive process control algorithms, like the speed control and fuel management in your modern car uses time series analysis.

One of the originators of time series analysis, George E. P. Box and his seminal book Time Series Analysis, Forecasting and Control, is often seriously misquoted, when he said All Models are Wrong, Some are Useful. Anyone misusing that quote to try and convince you, you can't model the future didn't (or can't) do the math in Box's book and likely got a D in the High School probability and statistics class.

So do the math, read the proper books, gather past data, model the future with dependency networks, Kanban and Scrum backlogs, measure current production, forecast future production based on Monte Carlo Models - and don't believe for a moment that you can make decision about future outcomes in the presence of uncertainties without estimating that future.

In the world of project management and the process improvement efforts needed to increase the Probability of Project Success anecdotes appear to prevail when it comes to suggesting alternatives to observed dysfunction.

If we were to pile all the statistics for all the data for the effectiveness or not effectiveness of all the process improvement methods on top of each other they would lack the persuasive power of a single anecdote in most software development domains outside of Software Intensive Systems.

Why? because most people working in small groups, agile, development projects, compared to Enterprise, Mission Critical can't fail, that must show up on time, on budget, with not just the minimum viable products, the the mandatorily needed viable capability - rely on anecdotes to communicate their messages.

I say this not from just personal experience, but from research for government agencies and commercial enterprise firms tasked with Root Cause Analysis, conference proceedings, refereed journal papers, and guidance from those tasked with the corrective actions of major program failures.

Anecdotes appeal to emotion. Statistics, numbers, verifiable facts appeal to reason. It's not a fair fight. Emotiona always wins without acknowledging that emotion is seriously flawed when making decisions.

Anecdotal evidence is evidence where small numbers of anecdotes are presented. There is a large chance - statistically - this evidence is unreliable due to cherry picking or self selection (this is the core issue with the Standish Reports or anyone claiming anything without proper statistical sampling processes).

Anecdotal evidence is considered dubious support of any generalized claim. Anecdotal evidence is no more than a type description (i.e., short narrative), and is often confused in discussions with its weight, or other considerations, as to the purpose(s) for which it is used.

We've all heard stories, ½ of all IT projects fail. Waterfall is evil, hell even estimates are evil stop doing them cold turkey. They prove the point the speaker is making right? Actually they don't. I just used an anecdote to prove a point.

If I said The Garfunkel Institute just released a study showing 68% of all software development projects did not succeed because of a requirements gathering process failed to define what capabilities were needed when done, I've have made a fact base point. And you'd become bored reading the 86 pages of statistical analysis and correlation charts between all the causal factors contributing to the success or failure of the sample space of projects. See you are bored.

Instead if I said every project I've worked on went over budget and was behind schedule because we were very poor at making estimates. That'd be more appealing to your emotions, since it is a message you can relate to personally - having likely experienced many of the same failures.

The purveyors of anecdotal evidence to support a position make use of a common approach. Willfully ignoring a fact based methodology through a simple tactic...

We all know what Mark Twain said about lies, dammed lies, and statistics

People can certainly lie with statistics, done all the time. Start with How to Lie With Statistics But those types of Lies are nothing compared to the able to script personal anecdotes to support a message. From I never seen that work, to what now you're telling me - the person that actually invented this earth shattering way of writing software - that it doesn't work outside my personal sphere of experience?

An anecdote is a statistic with a sample size of one. OK, maybe a sample size of a small group of your closest friends and fellow travelers.

We fall for this all the time. It's easier to accept an anecdote describing a problem and possible solution from someone we have shared experiences with, than to investigate the literature, do the math, even do the homework needed to determine the principles, practices, and processes needed for corrective action.

Don’t fall for manipulative opinion-shapers who use story-telling as a substitute for facts. When we're trying to persuade, use facts, use actual example based on those facts. Use data that can be tested outside personal anecdotes used to support an unsubstantiated claim without suggesting both the rot cause and the testable corrective actions.

How To Lie With Statisticsis a critically important book to have on your desk if you're involved any decision making. My edition is a First Edition, but I don't have the dust jacket, so not worth that much beyond the current versions.

The reason for this post is to lay the ground work for assessing reports, presentations, webinars, and other selling documents that contain statistical information.

The classic statistical misuse if the Standish Report, describing the success and failure of IT projects.

Here's my summation on the elements of How To Lie in our project domain

Sample with the Built In Bias - the population of the sample space is not defined. The samples are self selected in that those who respond are the basis of the statistics. No adjustment for all those who did not respond to a survey for example.

The Well Chosen Average - The arithmetical average, Median, and Mode are estimators of the population statistics. Any of these without a variance is of little value for decision making.

Little Figures That Are Not There - the classic is use this approach (in this case #NoEstimates) and your productivity will improve 10X, that 1000% by the way. A 1000% improvement. That's unbelievable, literally unbelievable. The actual improvements are stated, only the percentage. The baseline performance is not stated. It's unbelievable.

Much Ado About Practically Nothing - the probability of being in the range of normal. This is the basis of advertising. What's the variance?

Gee-Whiz Graphs - using graphics and adjustable scales provides the opportunity to manipulate the message. The classic example of this is the estimating errors in a popular graph used by the No Estimates advocates. It's a graph showing the number of projects that complete over there estimated cost and schedule. What's not shown is the credibility of the original estimate.

One Dimensional Picture - using a picture to show numbers, where the picture is not in the scale as the numbers provides a messaging path for visual readers.

Semi-attached Picture - If you can't prove what you want to prove, demonstrate something else and pretend that they are the same thing. In one example, the logic is inverted. Estimating is conjectured to be the root cause of problems. With no evidence of that, the statement we don't see how estimating can produce success, so not estimating will increase the probability of success.

Post Hoc Rides Again - posy hoc causality is common in the absence of a cause and effect understanding. The correlation and causality differences are many times not understood.

Here's a nice example of How To Lie

There's a chart from an IEEE Computer article showing the numbers of projects that exceeded their estimated cost. But let's start with some research on the problem. Coping with the Cone of Uncertainty.

There is a graph, popularly used to show that estimates

This diagram is actually MISUSED by the #NoEstimates advocates.

The presentation below shows the follow on information for how estimates can be improved the increase the confidence in the process and improvements in the business. As well shows the root causes of poor estimates and their corrective actions. Please ignore any ruse of Todd's chart without the full presentation.

My mistake was doing just that.

So before anyone accepts any conjecture from a #NoEstimates advocate using the graph above, please read the briefing at the link below to see the corrective actions for making poor estimates.

In a recent post of forecasting capacity planning a time series of data was used as the basis of the discussion.

Some static statistics were then presented.

With a discussion of the upper and lower ranges of the past data. The REAL question though is what is the likely outcomes for data in the future given the past performance data. That is if we recorded what happened in the past, what is the likely data in the future?

The average and upper and lower ranges from the past data are static statistics. That is all the dynamic behavior of the past is wiped out in the averaging and deviation processes, so that information can no longer be used to forecast the possible outcomes of the future.

This is one of the attributes of The Flaw of Averages and How to Lie With Statistics, two books that should be on every managers desk. That is managers tasked with making decisions in the presence of uncertainty when spending other peoples money.

We now have a Time Series and can ask the question what is the range of possible outcomes in the future given the values in the past? This can easily be done with a free tool - R. R is a statistical programming language that is free from the Comprehensive R Archive Network (CRAN). In R, there are several functions that can be used to make these forecasts. That is what are the estimated values in the future form the past and their confidence intervals.

Let's start with some simple steps:

Record all the data in the past. For example make a text file of the values in the first chart. Name that file NE.Numbers

Start the R tool. Better yet download an IDE for R. RStudio is one. That way there is a development environment for your statistical work. As well there are many Free R books on statistical forecasting - estimating outcomes in the future.

OK, read the Time Series of raw data from the file of value as assign it to a Variable

NETS=ts(NE.Numbers)

The ts function converts the Time Series into an object - a Time Series - that can be used by the next function

With the Time Series now in the right format, apply the ARIMA function. ARIMA is Autoregressive Integrated Moving Average. Also know as the Box-Jenkins algorithm. The is George Box of the famously misused and blatantly abused quote all models are wrong some models are useful. If you don't have the full paper where that quote came from and the book Time Series Analysis: Forecasting and Control, Box and Jenkins, please resist re-quoting out of context. That quoyte has become the meme for those not having the background to do the math for time series analysis and it becomes a mantra for willfully ignoring the math needed to actually make estimates of the future - forecasting - using time series of the past in ANY domain. ARIMA is the beginning basis of all statistical forecasting, the science, engineering, and finance.

The ARIMA algorithm has three parameters - p, d, q

With the original data turned into a Time Series and presented to the ARIMA function we can now apply the Forecast function. This function provides methods and tools for displaying and analyzing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

When applied to the ARIMA output we get a Forecast series that can be plotted.

Here's what all this looks like in RStudio:

NETS=ts(NE.Numbers) - convert the raw numbers to a time series NETSARIMA=arima(NETS, c=order(0,1,1)) - make an ARIMA object NEFORECAST = forecast(NETSARIMA) - make a forecast using that plot(NEFORECAST) - plot it

Here's the plot, with the time series from the raw data and the 80% and 90% confidence bands on the possible outcomes in the future.

The Punch Line

You want to make decisions with other peoples money when the 80% confidence in a possible outcome is itself a - 56% to +68% variance? really. Flipping coins gets a better probability of an outcome inside all the possible outcomes that happened in the past. The time series is essentially a random series with very low confidence of being anywhere near the mean. This is the basis of The Flaw of Averages.

Where I work this would be a non-starter if we came to the Program Manager with this forecast of the Estimate to Complete based on an Average with that wide a variance.

Possible where there is low value at risk, a customer that has little concern for cost and schedule overrun, and maybe where the work is actually and experiment with no deadline or not-to-exceed budget, or any other real constraint. But if your project has a need date for the produced capabilities, a date when those capabilities need to start earning their keep and need to start producing value that can be booked on the balance sheet a much higher confidence in what the future NEEDS to be is likely going to be the key to success

The Primary Reason forEstimates

First estimates are for the business. Yes developers can use them too. But the business has a business goal. Make money at some point in the future on the sunk costs of today - the breakeven date. These sunk costs are recoverable - hopefully - so we need to know when we'll be even with our investment. This is how business works, they make decisions in the presence of uncertainty - not on the opinion of development saying we recorded our past performance on an average for projected that to the future. No, they need a risk adjusted, statistically sound level of confidence that they won't run out money before breakeven. What this means in practice is a management reserve and cost and schedule margin to protect the project from those naturally occurring variances and those probabilistic events to derail all the best laid plans.

Now developers may not think like this. But someone somewhere in a non-trivial business does. Usually in the Office of the CFO. This is called Managerial Finance and it's how serious money at risk firms manage.

So when you see time series like those in the original post, do your homework and show the confidence of the probability of the needed performance actually showing up. And by needed performance I mean the steering target used in the Closed Loop Control system used to increase the probability that the planned value - that the Agilest so dearly treasure - actually appears somewhere near the planned need date and somewhere around the planned cost so the Return on Investment those paying for your work are not disappointed with a negative return and label their spend as underwater.

So What Does This Mean in the End?

Even when you're using past performance - one of the better ways of forecasting the future - you need to give careful consideration of those past numbers. Averages and simple variances which wipe out the actual underlying time series variances - are not only naive, they are bad statistics used to make bad management decisions.

Add to the poorly formed notion that decisions can be made about future outcomes in the presence of uncertainty in the absence of estimates about that future and you've got the makings of management disappointment. The discipline of estimating future outcomes from past behaviors is well developed. The mathematics and especially the terms used in that mathematics are well established. Here's some source we use in our everyday work. These are not populist books, they are math and engineering. They have equations, algorithm, code examples. They are used used the value at risk is sufficiently high that management is on the hook for meeting the performance goals in exchange for the money assigned to the project.

If you work a project that doesn't care too much about deadlines, budget overages, or what gets produced other than the minimal products, then these books and related papers are probably not for you. And most likely Not Estimating the probability that you'll not over spend, show up seriously late, and fail to produce the needed capabilities to meet the Business Plans, will be just fine. But if you are expected to meet the business goals in exchange for the spend plan you've beed assigned, these might be a good place start to avoid being a statistic (dead skunk on the middle of the road) in the next Chaos Report (no matter how poorly the statistics are).

This by the way is an understanding I came to on the plane flight home this week. #Noestimates is a credible way to run your project when these conditions are in place. Otherwise you may what to read how to make credible forecasts of what the cost and schedule is going to be for the value produced with your customer's money, assuming they actually care about not wasting it.

I hear all the time estimating is the same as guessing. This is not true mathematically nor is not true business process wise. This is an approach used by many (guessing), not understanding that making decisions in the presence of uncertainty requires we understand the impact of that decision. When that future is uncertain, we need to know that impact in probabilistic terms. And with this, comes confidence, precision, and accuracy of the estimate.

What’s the difference between estimate and guess? The distinction between the two words is one of the degree of care taken in arriving at a conclusion.

The word Estimate is derived from the Latin word aestimare, meaning to value. The term is has the origin of estimable, which means capable of being estimated or worthy of esteem, and of course esteem, which means regard as in High Regard.

To estimate means to judge the extent, nature, or value of something - connected to the regard - he is held in high regard, with the implication that the result is based on expertise or familiarity. An estimate is the resulting calculation or judgment. A related term is approximation, meaning close or near.

In between a guess and an estimate is an educated guess, a more casual estimate. An idiomatic term for this type of middle-ground conclusion is ballpark figure. The origin of this American English idiom, which alludes to a baseball stadium, is not certain, but one conclusion is that it is related to in the ballpark, meaning close in the sense that one at such a location may not be in a precise location but is in the stadium.

To guess is to believe or suppose, to form an opinion based on little or no evidence, or to be correct by chance or conjecture. A guess is a thought or idea arrived at by one of these methods. Synonyms for guess include conjecture and surmise, which like guess can be employed both as verbs and as nouns.

We could have a hunch or an intuition, or we can engage in guesswork or speculation. Dead reckoning is same thing as guesswork. Dead reckoning was originally referred to a navigation process based on reliable information. Near synonyms describing thoughts or ideas developed with more rigor include hypothesis and supposition, as well as theory and thesis.

A guess is a casual, perhaps spontaneous conclusion. An estimate is based on intentional thought processes supported by data.

What Does This Mean For Projects?

If we're guessing we're making uninformed conclusions usually in the absence of data, experience, or any evidence of credibility. If we're estimating we are making informed conclusions based on data, past performance, models - including Monte Carlo models, and parametric models.

When we hear decisions can be made without estimates. Or all estimating is guessing, we now mathematically and business process - neither of this is true.

Decision making is hard. Decision making is easy when we know what to do. When we don't know what to do there are conflicting choices that must be balanced in the presence of uncertainty for each of those choices. The bigger issue is that important choices are usually ones where we know the least about the outcomes and the cost and schedule to achieve those outcomes.

Decision science evolved to cope with decision making in the presence of uncertainty. This approach goes back to Bernoulli in the early 1700s, but remained an academic subject into the 20th century, because there was no satisfactory way to deal with the complexity of real life. Just after World War II, the fields of systems analysis and operations research began to develop. With the help of computers, it became possible to analyze problems of great complexity in the presence of uncertainty.

In 1938, Chester Barnard, authored of The Functions of the Executive, and coined the term “decision making” from the lexicon of public administration into the business world. This term replaced narrower descriptions such as “resource allocation” and “policy making.”

Philosophy - uncertainty is a consequence of our incomplete knowledge of the world. In some cases, uncertainty can be partially or completely resolved before decisions are made and resources committed. In many important cases, complete information is not available or is too expensive (in time, money, or other resources) to obtain.

Decision framework - decision analysis provides concepts and language to help the decision-maker. The decision maker is aware of the adequacy or inadequacy of the decision basis:

Decision-making process - provides a step-by-step procedure that has proved practical in tackling even the most complex problems in an efficient and orderly way.

Decision making methodology - provides a number of specific tools that are sometimes indispensable in analyzing a decision problem. These tools include procedures for eliciting and constructing influence diagrams, probability trees, and decision trees; procedures for encoding probability functions and utility curves; and a methodology for evaluating these trees and obtaining information useful to further refine the analysis.

Each level focuses on different aspects of the problem of making decisions. And it is decision making that we're after. The purpose of the analysis is not to obtain a set of numbers describing decision alternatives. It is to provide the decision-maker the insight needed to choose between alternatives. These insights typically have three elements:

What is important to making the decision?

Why is it important?

How important is it?

Now To The Problem at Hand

It has been conjectured ...

The key here and the critical unanswered question is how can a decision about an outcome in the future, in the presence of that uncertain future, be made in the absence of estimating the attributes going into that decision?

That is, if we have less than acceptable knowledge about a future outcome, how can we make a decision about the choices involved in that outcome?

Dealing with Uncertainty

All project work operates in the presence of uncertainty. The underlying statistical processes create probabilistic outcomes for future activities. These activities may be probabilistic events, or the naturally occurring variances of the processes that make up the project.

Clarity of discussion through the language of probability is one of the basis of decision analysis. The reality of uncertainty must be confronted and described, and the mathematics of probability is the natural language to describe uncertainty.

When we don't have the clarity of language, when redefining mathematical terms, misusing mathematical terms, enters the conversation, agreeing on the ways - and there are many ways - of making decisions in the presence of an uncertain future - becomes bogged down in approaches that can't be tested in any credible manner. What remains is personal opinion, small sample anecdotes, and attempts to solve complex problems with simple and simple minded approaches.

For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken

Project work is random. Most everything in the world is random. The weather, commuter traffic, productivity of writing and testing code. Few things actually take as long as they are planned. Cost is less random, but there are variances in the cost of labor, the availability of labor. Mechanical devices have variances as well.

Few things actually take as long as they are planned. Cost is less random, but there are variances in the cost of labor, the availability of labor. Mechanical devices have variances as well.

The exact fit of a water pump on a Toyota Camry is not the same for each pump. There is a tolerance in the mounting holes, the volume of water pumped. This is a variance in the technical performance.

Managing in the presence of these uncertainties is part of good project management. But there are two distinct paradigms of managing in the presence of these uncertainties.

We have empirical data of the variances. We have samples of the hole positions and sizes of the water pump mounting plate for the last 10,000 pumps that were installed. We have samples of how long it took to write a piece of code and the attributes of the code that are correlated to that duration. We have empirical measures.

We have a theoretical model of the water pump in the form of a 3D CAD model with the materials modeling for expansion, drilling errors of the holes and other static and dynamic variances. We have to model the duration of work using a Probability Distribution Function and a Three-Point estimate of the Most Likely Duration, the Pessimistic and Optimistic duration. These can be derived from past performance, but we don't have enough actual data to produce the PDF and have a low enough Sample Error for our needs. We Don't have empirical measures, but we have a Model.

In the first case, we have empirical data. In the second case, we don't. There are two approaches to modeling what the system will do in terms of cost and schedule outcomes.

Bootstrapping the Empirical Data

With samples of past performance and the proper statistical assessment of those samples, we can resample them to produce a model of future performance. This bootstrap resampling shares the principle of the second method - Monte Carlo Simulation - but with several important differences.

The researcher - and we are researching what the possible outcomes might be from our model - does not know nor have any control of the Probability Distribution Function that generated the past sample. You take what you got.

As well we don't have any understanding of Why those samples appear as they do. They're just there. We get what we get.

This last piece is critical because it prevents us from defining what performance must be in place to meet some future goal. We can't tell what performance we need because we have not modeled of the need performance, just samples from the past.

This results from the statistical conditions that there is a PDF for the process that is unobserved. All we have is a few samples of this process.

With these few samples, we're going to resample them to produce a modeled outcome. This resampling locks in any behavior of the future using the samples from the past, which may or may not actually represent the true underlying behavior. This may be all we can do because we don't have any theoretical model of the process.

This bootstrapping method is quick, easy, and produces a quick and easy result. But it has issues that must be acknowledged.

There is a fundamental assumption that the past empirical samples represent the future. That is, the samples contained in the bootstrapped list and their resampling are also contained in all the future samples.

Said in a more formal way

If the sample of data we have from the past is a reasonable representation of the underlying population of all samples from the work process, then the distribution of parameter estimates produced from the bootstrap model on a series of resampled data sets will provide a good approximation of the distribution of that statistics in the population.

With this sample data and its parameters (statistical moments), we can make a good approximation of the future.

There are some important statistical behaviors though that must be considered, starting with the future samples are identical to the statistical behaviors of the past samples.

Nothing is going to change in the future

The past and the future are identical statistically

In the project domain that is very unlikely

With all these conditions, for a small project, with few, if any interdependencies, a static work process with a little boot strapping is a nice quick and dirty approach to forecasting (estimating the future) based on the past.

Monte Carlo Simulation

This approach is more general and removes many of the restrictions to the statistical confidence of bootstrapping.

Just as a reminder, in principle both the parametric and the nonparametric bootstrap are special cases of Monte Carlo simulations used for a very specific purpose: estimate some characteristics of the sampling distribution. But like all principles, in practice, there are larger differences when modeling project behaviors.

In the more general approach of Monte Carlo Simulation, the algorithm repeatedly creating random data in some way, performing some modeling with that random data, and collecting some result.

The duration of a set independent tasks

The probabilistic completion date of a series of tasks connected in a network (schedule), each with a different Probability Distribution Function evolving as the project moves into the future.

A probabilistic cost correlated with the probabilistic schedule model. This is called the Joint Confidence Level. Both cost and schedule are random variables with time evolving changes in their respective PDFs.

In practice when we hear Monte Carlo simulation we are talking about a theoretical investigation, e.g. creating random data with no empirical content - or from reference classes - used to investigate whether an estimator can represent known characteristics of this random data, while the (parametric) bootstrap refers to an empirical estimation and is not necessary a model of the underlying processes, just a small sample of observations independent from the actual processes that generated that data.

The key advantage of MCS is we don't necessarily need past empirical data. MCS can be used to advantage if we do, but we don't need it for the Monte Carlo Simulation algorithm to work.

This approach could be used to estimate some outcome, like in the bootstrap, but also to theoretically investigate some general characteristic of a statistical estimator (cost, schedule, technical performance) which is difficult to derive from empirical data.

MCS removes the roadblock heard in many critiques of estimating - we don't have any past data on which to estimate. No problem, build a model of the work, the dependencies between that work, and assign statistical parameters to the individual or collected PDFs and run the MCS to see what comes out.

This approach has several critical advantages:

The first is a restatement - we don't need empirical data, although it will add value to the modeling process.

This is the primary purpose of Reference Classes

They are the raw material for defining possible future behaviors form the past

We can make judgement of what he future will be like, or most importantly what the future MUST be like to meet or goals, run the simulation and determine is our planned work will produce a desired result.

So Here's the Killer Difference

Bootstrapping models make several key assumptions, which may not be true in general. So they must be tested before accepting any of the outcomes.

The future is like the past.

The statistical parameters are static - they don't evolve with time. That is the future is like the past, an unlikely prospect on any non-trivial project.

The sampled data is identical to the population data both in the past and in the future.

Monte Carlo Simulation models provide key value that bootstrapping can't.

Different Probability Distribution Function can be assigned to work as it progresses through time

The shape of that PDF can be defined from past performance, or defined from the needed performance. This is a CRITICAL capability of MCS

The critical difference between Bootstrapping and Monte Carlo Simulation is that MCS can show what the future performance has to be to stay on schedule (within variance), on cost, and have the technical performance meet the needs of the stakeholder.

When the process of defining the needed behavior of the work is done, a closed loop control system in put in place. This needed performance is the steering target. Measures of actual performance compared to needed performance generate the error signals for taking corrective actions. Just measuring past performance and assuming the future will be the same, is Open Loop control. Any non-trivial project management method needs a closed loop control system

Bootstrapping can only show what the future will be like if it like the past, not what it must be like. In Bootstrapping this future MUST be like the past. In MCS we can tune the PDFs to show what performance has to be to manage to that plan. Bootstrapping is reporting yesterday's weather as tomorrow's weather - just like Steve Martin in LA Story. If tomorrow's weather turns out not to be like yesterday's weather, you gonna get wet.

MCS can forecast tomorrows weather, by assigning PDFs to future activities that are different than past activities, then we can make any needed changes in that future model to alter the weather to meet or needs. This is in fact how weather forecasts are made - with much more sophisticated models of course here at the National Center for Atmospheric Research in Boulder, CO

This forecasting (estimating the future state) of possible outcomes and the alternation of those outcomes through management actions to change dependencies, add or remove resources, provide alternatives to the plan (on ramps and off maps of technology for example), buy down risk, apply management reserve, assess impacts of rescoping the project, etc. etc. etc. is what project management is all about.

Bootstrapping is necessary but far from sufficient for any non-trivial project to show up on of before the need date (with schedule reserve), at o below the budgeted cost (with cost reserve) and have the produce or service provide the needed capabilities (technical performance reserve).

Here's an example of that probabilistic forecast of project performance from a MCS (Risky Project). This picture shows the probability for cost, finish date, and duration. But it is built on time evolving PDFs assigned to each activity in a network of dependent tasks, which models the work stream needed to complete as planned.

When that future work stream is changed to meet new requirements, unfavorable past performance and the needed corrective actions, or changes in any or all of the underlying random variables, the MCS can show us the expected impact on key parameters of the project so management in intervention can take place - since Project Management is a verb.

The connection between the Bootstrap and Monte Carlo simulation of a statistic is simple.

Both are based on repetitive sampling and then direct examination of the results.

But there are significant differences between the methods (hence the difference in names and algorithms). Bootstrapping uses the original, initial sample as the population from which to resample. Monte Carlo Simulation uses a data generation process, with known values of the parameters of the Probability Distribution Function. The common algorithm for MCS is Lurie-Goldberg. Monte Carlo is used to test that the results of the estimators produce desired outcomes on the project. And if not, allow the modeler and her management to change those estimators and then mange to the changed plan.

Bootstrap can be used to estimate the variability of a statistic and the shape of its sampling distribution from past data. Assuming the future is like the past, make forecasts of throughput, completion and other project variables.

In the end the primary differences (and again the reason for the name differences) is Bootstrapping is based on unknown distributions. Sampling and assessing the shape of the distribution in Bootstrapping adds no value to the outcomes. Monte Carlo is based on known or defined distributions usually from Reference Classes.

In the business of building software intensive systems; estimating, performance forecasting and management, closed loop control in the presence of uncertainty for all variables is the foundation needed for increasing the probability of success.

This means math is involved in planning, estimating, measuring, analysis, and corrective actions to Keep the Program Green.

When we have past performance data, here's one approach...

And the details of the math in the Conference paper

The world of projects, project management, and the products or services produced by those projects is uncertain. It's never certain. Seeking certainty is not only naive, it's simply not possible.

Making decisions in the presence of this uncertainty is part of our job as project managers, engineers, developers on behave of those paying for our work.

It's also the job of the business, whose money is being spent on the projects to produce tangible value in exchange for that money.

From the introduction of the book to the left...

Science and engineering, our modern ways of understanding and altering the world, are said to be about accuracy and precision. Yet we best master the complexity of our world by cultivating insight rather than precision. We need insight because our minds are but a small part of the world. An insight unifies fragments of knowledge into a compact picture that fits in our minds. But precision can overflow our mental registers, washing away the understanding brought by insight. This book shows you how to build insight and understanding first, so that you do not drown in complexity.

So what does this mean for our project world?

The future is uncertain. It is always uncertain. It can't be anything but uncertain. Assuming certainty, is a waste of time. Managing in the presence of uncertanity is unavoidable. To do this we must estimate. This is unavoidable. To suggest otherwise willfully ignores the basis of all management practices.

This uncertainty creates risk to our project. Cost, schedule, and risks to the delivered capabilities of the project or product development. To manage in with closed loop process, estimates are needed. This is unavoidable as well.

Uncertainty is either reducible or irreducible

Reducible uncertainty can be reduced with new information. We can buy down this uncertainty.

Irreducible uncertainty - the natural variations in what we do - can only be handled with margin.

In both these conditions we need to get organized in order to address the underlying uncertainties. We need to put structure in place in some manner. Decomposing the work is a common way in the project domain. From a Work Breakdown Structure to simple sticky notes on the wall, breaking problems down into smaller parts is a known successful way to address a problem.

With this decomposition, now comes the hard part. Making decisions in the presence of this uncertainty.

A well known agile thought leader made a statement today

I support total chaos in every domain

This is unlikely going to result in sound business decisions in the presence of uncertainty. Although there may be domains where chaos might produce usable results, when some degree of confidence that the money being spent will produce the needed capabilities, on of before the need date, at of below the budget needed to be profitable, and with the collection of all the needed capability to accomplish the mission or meet the business case, we're going to need to know how to manage our work to achieve those outcomes.

So let's assume - with a high degree of confidence - that we need to manage in the presence of uncertainty, but we have little interest in encouraging chaos, here's one approach.

So In The End

Since all the world's a set of statistical processes, producing probabilistic outcomes, which in turn create risk to any expected results when not addressed properly - the notion that decisions can be made in the presence of this condition can only be explained by the willful ignorance of the basic facts of the physic of project work.

There is a popular notion in the #NoEstimates paradigm that Empirical data is the basis of forecasting the future performance of a development project. In principle this is true, but the concept is not complete in the way it is used. Let's start with the data source used for this conjecture.

There are 12 sample in the example used by #NoEstimates. In this case stickies per week. From this time series an average is calculated for the future. This is the empirical data is used to estimate in the No Estimates paradigm. The Average is 18.1667 or just 18 stickies per week.

But we all have read or should have read Sam Savage's The Flaw of Averages. This is a very nice populist book. By populist I mean an easily accessible text with little or not mathematics in the book. Although Savage's work is highly mathematically based with his tool set.

There is a simple set of tools that can be applied for Time Series analysis, using past performance to forecast future performance of the system that created the previous time series. The tool is R and is free for all platforms.

Here's the R code for performing a statistically sound forecast to estimate the possible ranges values the past empirical stickies can take on in the future.

Put the time series in an Excel file and save it as TEXT named BOOK1

> SPTS=ts(Book1) - apply the Time Series function in R to convert this data to a time series > SPFIT=arima(SPTS) - apply the simple ARIMA function to the time series > SPFCST=forecast(SPFIT) - build a forecast from the ARIMA outcome > plot(SPFCST) - plot the results

Here's that plot. This is the 80% and 90% confidence bands for the possible outcomes in the future from the past performance - empirical data from the past.

The 80% range is 27 to 10 and the 90% range is 30 to 5.

So the killer question.

Would you bet your future on a probability of success with a +65 to -72% range of cost, schedule, or technical performance of the outcomes?

I hope not. This is a flawed example I know. Too small a sample, no adjustment of the ARIMA factors, just a quick raw assessment of the data used in some quarters as a replacement for actually estimating future performance. But this assessment shows how to empirical data COULD support making decisions about future outcomes in the presence of uncertainty using past time series once the naive assumptions of sample size and wide variances are corrected..

The End

If you hear you can make decisions without estimating that's pretty much a violation of all established principles of Microeconomics and statistical forecasting. When answer comes back we sued empirical data, that your time series empirical data, download R, install all the needed packages, put the data in a file, apply the functions above and see if you really want to commit to spending other peoples money with a confidence range of +65 to -72% of performing like you did in the past? I sure hope not!!

Decision Analysis provides ways to incorporate judgments about uncertainty into the evaluation of alternatives. Cost professionals using these methods can provide more-credible analyses. The foundation calculation is expected value, usually solved by a decision tree or by Monte Carlo simulation. A formal definition is:

Decision analysis is the discipline comprising the philosophy, theory, methodology, and professional practice necessary to address important decisions in a formal manner.

In project management and especially the software development project management domain, making decisions is always about making decisions in the presence of uncertainty. Uncertainty is always in place for the three core elements of any project, shown below:

In order to make decisions about future outcomes of a project subject to these uncertainties, we need to not only know how these three variables randomly interact, but also how they behave as standalone processes. This behaviour - in the presence of uncertainty - has two types:

Event based behaviour - the probability that something will or will not happen in the future.

Naturally occurring random behaviour - the Probability Distribution Function that describes the possible outcomes.

Both these behaviours are present in all project work. If they are not present the project is likely simple enough, you can just start working and have the confidence of completing on time, on budget and have the outcome work.

When the behaviours are not deterministic and the interactions are not deterministic - then to make decisions we can only do one thing.

We need to estimate these behaviours

When it is suggested we can make decisions in the presence of uncertainty, and there is no method provided to make those decisions, it can't be true.

Recording past performance and then taking the average of that performance as an estimate of future performance is seriously flawed, as discussed in How to Avoid Yesterday's Weather Estimating Problem.

So do the math, random variables abound in our project domain, be it software, hardware, pouring concrete, welding steel - it's all random. Don't fall prey the simple minded statements of we're exploring ways to make decisions without estimates without asking the direct question - show me how that nonexistent description does not violate the principles of Decision Analysis and Microeconomics of software development decision making.

The continued lack of understanding of the underlying probability and statistics of making decisions in the presence of uncertainty to plague the discussion of estimating software.

All elements of all projects are statistical in nature. This statistical behaviour - reducible or irreducible stochastic processes - creates uncertainty.

Event based uncertainties have a probability of occurrence and a probability of the impact once that uncertainty becomes a relaity. These are Epistemic uncertainties - epistemology is the studdy of knowledge. Espitemtic means knowing or not knowing in this case/ We can buy knowledge. This is the core concept of agile paradigm/ We are buying down risk by building software to test the uncertainties of the project deliverables. This is the basis of saying agile is about risk management. But I suspect those saying that without being able to do the math as we say in our domain, don't realize what they are actually saying.

The natural occurring variances are aleatory. They are always there, they are irreducible. That is they can't be fixed. Work effort and duration is aleatory. The ONLY fix for aleatory uncertainty and the resulting risk is margin. Cost margin, schedule margin, technical performance margin. You can't buy the fix to aleatory uncertainty.

you can't make decisions in the presence of uncertainty without estimating the outcome of your decision in the future. Using empirical data is preferred. But that empirical data MUST be adjusted for future uncertainty., past variances, sampling errors, poor representations of the actual process and the plethora of other drivers of uncertainty. Having small, simple samples without variances, and most of all confirming the past actually does represent the future - and doing that mathematically not just announcing it - is needed for any estimates of the future to have any credibility. Otherwise it's just an uninformed bad guess.

There is a popular notion in agile estimating that a time series of past performance can be used to forecast the future performance of the project. Here's a clip of that time series.It is conjectured that data like this can be used to make decisions about the future. And certainly data about the past CAN be used and IS used to make decisions (forecast based decisions), but there are serious flaws in the suggested approach.

But the Flaw of Averages is deeper (pun intended) than the simple cartoon. Any time series of data, like the first chart can be used to forecast the possible outcomes from past performance. The Autoregressive Integrated Moving Average function in R, can easily shows that a time series like the first chart has a huge swing in possible outcomes. Below is an example, from a much better behaved time series. In this case the Schedule Performance Index of a software intensives system.

When we see Poorman's solutions to estimating future outcomes, be careful not to become that poorman by spending your customers money in ways that bankrupt the project.

Buy Savage's book, google software estimating, read up on ARIMA and time series analysis, download R and all its free books and documents. Here's some more resources on estimating software projects.

Don't fall prey to simple and many times simple minded approaches to managing other peoples money. Math is required when dealing with probabilistic processes found on ALL projects. The future is always uncertain and NEVER the same as the past. This is the case even of production processes found at Toyota. Margin and risk reduction activities are always needed. Knowing how much margin and what risk reduction activities is past of the planning process for any non-trivial project.

## Recent Comments