In probability theory, de Finetti's theorem† explains why exchangeable observations are conditionally independent given some latent variable to which an epistemic probability distribution would then be assigned. It is named in honor of Bruno de Finetti.

It states that an exchangeable sequence of Bernoulli random variables is a "mixture" of independent and identically distributed (i.i.d.) Bernoulli random variables – while the individual variables of the exchangeable sequence are not themselves i.i.d., only exchangeable, there is an underlying family of i.i.d. random variables.

Thus, while observations need not be i.i.d. for a sequence to be exchangeable, there are underlying, generally unobservable, quantities which are i.i.d. – exchangeable sequences are (not necessarily i.i.d.) mixtures of i.i.d. sequences.

All of this actually has importance. When we start to assess risks using probabilistic process based on statistical processes, we need to be very careful to understand the underlying mathematics.

There are four approach to saying what we mean when we say “probability”

Logical – weak implications

Propensity – physical properties

Frequency – attributed to sequences of observations

Subjective – personal opinion

† Finetti’s Theorem is at the heart of estimating random variables. Cost is a random variable, like schedule durations and the technical outcomes from the effort based on cost and schedule. In statistical assessment of cost and schedule, Frequentist (counting) statistics is one approach. The second is Bayesian inference (used in most science). The exchangeability of the random variables is critical to building times series of sampled data from the project to forecast future performance.

Galileo Galilei, Letter to the Grand Duchess Christina of Tuscany (1615)

.... Considering the force exerted by logical deductions, they may ascertain that it is not in the power of the professors of demonstrative sciences to change their opinions at will and apply themselves first to one side and then to the other.

There is a great difference between commanding a mathematician or a philosopher and influencing a lawyer or a merchant, for demonstrated conclusions about things in nature or in the heavens cannot be changed with the same facility as opinions about what is or is not lawful in a contract, bargain, or bill of exchange.

If those suggesting we abandon the principles of Microeconomics of Software Development (decision making in the presence of scarcity, abundance, and economic value)^{†}, requiring that decisions made today with their impacts on future outcomes, do so without probabilistic knowledge of those impacts can be done in the absence of estimating those impacts - think again.

It Just Ain't So

† Software Project Effort Estimation: Foundations and Best Practice Guidelines for Success, May 7, 2014 by Adam Trendowicz and Ross Jeffery

This tomb holds Diophantus. Ah, what a marvel! And the tomb tells scientifically the measure of his life. God vouchsafed that he should be a boy for the sixth part part of his life; when a twelfth was added., his cheeks acquired a beard; he kindled for him the light of marriage after the seventh, and in the fifth year of his marriage He granted him a son. Alas! late-begotten and miserable child, when he had reached the measure of half his father's life, the chill grave took him. After consoling his grief by this science of numbers for four years, he reached the end of his life. Greek Mathematical Works II: Aristarchus to Pappus of Alexandria, Loeb Classical Library, Translated by Ivor Thomas, Harvard University Press, 1941.2

How old was Diophantus when he died and how old was his son? Let's assume the phrase half his father's life to mean half the total life, not half the life at the time of the sons death.

Assume Diophantus lived to x years.

The least common denominator of the denominators is 84. Multiplying out all the terms with this common denominator gives

Grouping multipliers of x on one side and constants on the other gives.

Diophantus was X = 84 years old. He was a boy for 14 years and grew a beard after 7 more (21). Twelve year later he marries at age 33 and had a son 5 years after that (38). The son died at the age of 42, when Diophantus was 80. Diophantus dies at the age of 84.

A favorite blog is Critical Uncertainties where Matthew Squair writes about risk. Risk in broad terms. But risk in a narrow term is just as important and just as critical.

Thanks to Matthew for the picture to the left and the quote of Lord Thompson, before he boarded the airship headed to India. The R-101 is as safe as a house, except for the millionth chance.

In the software development business, risk results from uncertainty. Risk that we'll overrun our budget. Risk that we'll show up late. Risk that what we produce won't actually work. Risk that what we produce won't be what the customer thought they were getting.

Agile development is many times billed as a risk management process. Which is both correct and incorrect at the same time. A principle of Agile Software Development - and Agile is a software development method - focuses on mandatory production of outcomes in short periods of time. These outcomes - working software - can be assessed to be compliant with the customer needs. These short periods of time provide inch pebbles on the path to completion of the collection of needed capabilities. These frequent outcomes provide feedback needed to take corrective action when those outcomes aren't what the customer wanted. But this feedback is a lagging indicator. It's after the fact. Do the work, look at the results, adjust. The probabilistic future risk and it's probabilistic choices to take corrective action to reduce risk or margin to protect from a risk, are not a core competency of Agile. More is needed on top of the feedback process to protect future evens or variances from impacting the emergent outcomes from the agile development process.

Agile is not a risk management processes per se. Agile doesn't address the core practices of Risk Management, shown to the left.

Agile does not identify risks upfront, analyze those identified risk, plan for their reduction in explicit ways, track them in risk burn down ways, control them in advance. Agile can respond to a risk when it is encountered. By that time the risk has turned into an Issue. Changes to the project can then be made to go a new direction, but only when the working software is present can that decision be made.

Risks are probabilistic. These probabilistic risks come from one of two sources. There is a probability that an event will occur in the future that will unfavorably impact the outcomes of the project. These type of uncertainties come from the lack of knowledge. They are epistemic uncertainties. This missing knowledge can be bought. Agile can buy this knowledge by producing something that can be assessed. Money can be spent to develop an incremental outcome that can test an idea, an outcome, a capability to determine if it is what the customer wants. But doing this work requires time and money. So planning for this time and money has to be part of the development process.

There are statistical variances in the project that will impact outcomes as well. We can't buy down these processes. They are Aleatory. They are Irreducible. The only solution for aleatory uncertainties - irreducible uncertainties - is margin. Cost margin. Schedule margin. Technical margin.

To determine these margins we need several things:

A model of the underlying statistical processes that produce the irreducible uncertainties - the naturally occurring variances. These models can come from past performance - we've done this many times and the most likely value is X, with the least value being X-15% and the highest value being X+25%.

A parametric model where the observed past values are scaled to match the current model.

In either case - and other modeling approaches - this type of risk analysis produces probabilistic ranges of values that might occur. Rather than a probability of occurrence - which is an event.

Since both reducible and irreducible risks result from the underlying uncertainties requires us to estimate both the probabilities for event based outcomes and estimates for the range of possibilities from aleatory risks.

So when we revisit the title of the this post Risk Management is How Adults Manage Projects - Tim Lister, we see that Estimating are required to manage risk, since risk is always in the future, driven by underlying statistical process, emergent (event based and reducible) or natural variable process (probability distributions, and irreducible).

When we encounter simple answers to complex problems, we need to not only be skeptical, we need to think twice about the credibility of the person posing the solution. A recent example is:

The cost of software is not directly proportional to the value it produces. Knowing cost is potentially useless information.

The first sentence is likely the case. Value of any one feature or capability is not necessary related to it's cost. Since cost in software development is nearly 100% correlated with the cost of the labor needed to produce the feature.

But certainly the cost of developing all the capabilities and the cost of individual capabilities when their interactions are considered must be related to their value or the principles of Microeconomics of Software Development would not longer be in place.

Microeconomics is a branch of economics that studies the behavior of individuals and small impacting organizations in making decisions on the allocation of limited resources. Those limited resources include (but are not limited to) Time and Money.

So without knowing the cost or time it takes to produce an outcome, the simple decision making processes of spending other peoples money based on the Return on that Investment gets a divide by zero error

ROI = (Value - Cost) / Cost

Since all elements of a project are driven by statistical processes, the outcomes are always probabilistic. The delivered capabilities are what the customer bought. Cost and Schedule are needed to produce those capabilities. The success of the project in providing the needed capabilities depends on knowing the Key Performance Parameters, the Measures of Effectiveness, the Measures of Performance, and the Technical Performance Measures of those capabilities and the technical and operational requirements that implement them.

The cost and schedule to fulfill all these probabilistic outcomes is itself probabilistic. It is literally impossible to determine these outcomes in a deterministic manner when each is a statistical process without estimating. The Cost and Schedule elements are also probabilistic, further requiring estimates.

The notion that you can determine the Value of something without knowing its Cost is actually nonsense. Anyone suggesting that is the case has little understanding of business, microeconomics of software development or how the world or business treats expenditures of other peoples money.

Here's some background to help in that understanding:

Between this last book and the books above and all the papers, articles, and training provided about how to manage other people's money when producing value from software systems, you'll hopefully come to realize those notions that we don't need to know the cost, can't know the cost, and poor at making estimates, and should simply start coding and see what comes out are not only seriously misinformed, but misinformed with intentional ignorance.

If your project is not using other peoples money, if your project has low value at risk, if your project is of low importance to those paying, then maybe, just maybe they don't really care how much you spend, when you'll be done, or what will result. But that doesn't sound very fulfilling where I live.

I spoke at a workshop this week at The Nexus of Agile Software Development and Earned Value Management, OUSD(AT&L)/PARCA, February 19 – 20, 2015 Institute for Defense Analysis, Alexandria, VA.

This meeting was attended by government and industry representatives to share ideas of how to integrate Agile Software Development on Earned Value Management programs. EVM programs start with awards greater than $20M, so these are non-trivial efforts. The presentations will be available soon,. and I'll update this post when they're posted on the PARCA site.

In the mean time there is existing guidance for starting this process. But first here's a collection from SEI on the topic.

First Principle - Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.”

Second Principle - Welcome changing requirements, even late in development

Many in the Agile community like to use words like Self Organizing,Emergent, Complexity,and Complex Adaptive Systems, without actual being able to do the mathematics behind these concepts. They've turned the words into platitudes. This is the definition of popularization - a core idea from science and mathematics (physics many times) without the math.

These popularizations, spawned a small industry of using the words in ways no longer connected with the actual mathematical model of self organizations, complexity, complex adaptive systems, and emergence of the underlying simplicity into a complex outcome.

There is a pop-psychology approach to core mathematics and the physics of complex systems as well.

Self Organization requires several conditions for it to be in place and be observed

A High Degree of Structure

The capacity for coordinated action

A mechanism for system-wide feedback and amplification

Some means to transform a small event into a larger driving force for the system to organize itself into a coherent system

The key is coordination across boundaries and the capacity for action. This implies - quite explicitly - a deterministic response to external stimulus. The self-organization properties require structured communication channels to be in place for the systems to posses this property.

So next time you hear self organizing teams are the best ask to see what structures are in place to provide the channels for coordinated actions. What mechanisms are being used for system-wide feedback within that highly structured process framework, and what are the means of transforms small - potentially very small stimuli - into the collective actions of the whole?

In the broader sense, these concepts all live in a world governed in a deterministic manner through...

Feedback - the return of a portion the output of a process or system to the input. These means modeling the transform function - usually G(S), where S is the system dynamic model, and G is the transform function. Both can be represented by non-linear differential equations

System Dynamics is the next level of modeling for the structured, coordinated, system-wide feedback and amplification (both positive and negative).

This involves state-space modeling or phase space) where an abstract space - a mathematical model of in which all possible states of a system - are represented, with each possible state of the system corresponding to one unique point in the state space. Dimensions of state space represent all relevant parameters of the system. For example state space of mechanical systems has six dimensions and consists of all possible values of position and momentum variables.

The Trajectory of the system describing the sequence of system states as they evolve.

A fixed point in the state space where the system is in equilibrium and does not change. In complex projects and systems they represent, this is the steering signal needed to compare the feedback to so corrective actions can be taken by the system to maintain equilibrium and run off the cliff.

The Attractor is a part of the state space where some trajectories end.

The actual dynamics of the system - where the set of functions that encode the movement of the system from one point in the state space to another. This is the foundation of the mechanism for feedback and structuring of the disconnected components of the system. These dynamics are many times modeled with sets of differential equations containing the rules for the interactions.

The complex part of complex systems is the subtle - and poorly understood without the mathematics - property, that a deterministic system can have emergent and very different outcomes from the sensitivities of the starting conditions.

The double pendulum is a nice example. The equations of the Double Pendulum are a classic two year physics student problem. My introduction to FORTRAN 77 was to code the solution to the Double Pendulum problem in the Dynamics course.

Or maybe that person is just fond of using words they don't actually know the meaning of - at the mathematical level. As a classic example self organizing is defined (first used by William Rose Ashby in 1948) as the ability of the system to autonomously (without being guided or managed by an outside source) increase its complexity. So when we hear self-organizing is good, the system we're applying it too is getting MORE complex without any external guidance. Wonder if that's what we actually wanted.

In the #NoEstimates conversation, the term empirical data is used as a substitute for Not Estimating. This notion of No Estimates - that is making decisions (about the future) with No Estimates, is oxymoronic since gathering data and making decisions about the future from empirical data is actually estimating.

But that aside for the moment, the examples in the No Estimates community of empirical data are woefully inadequate for any credible decision making. Using 22 or so data samples with a ±30 variance to forecast future outcomes when spending other peoples money doesn't pass the smell test where I work.

Here's some sources of actual data for IT projects that can be used to build Reference Class that have better statistics.

The current issue of ORMS Today has resources as well ORMS can be obtained for free. There are several professional societies that provide guidance for estimating

As well I have a colleague, Mario Vanhoucke, who speaks at our Earned Value Management conferences, whose graduate studies do research on project performance management. A recent paper, "Construction and Evaluation of Frameworks for Real Life Project Database," is a good source of how to apply empirical data to making estimates of outcomes in the future. Mario teaches Economics and Business Administration, at Ghent University and is a founder of OR-AS.

All of this is to say, using empirical data is necessary but not sufficient. Especially when the data being used if too small a sample size, statistically unstable, or at a minimum statistical broad variances. To be sufficient, we need a few more things:

The correlations between the data samples as the evolve in time. This is Time Series Analysis.

sample sizes sufficient to draw variances assessment of the future outcomes.

A broader Reference Class basis, than just the small number of samples in the current work stream. These small samples can be useful IF the future work represents the same class of work. This would imply the project itself is straightforward, has little emergent risk (reducible or irreducible), and we're confident not much is going to change. Without those assumption the statistics from those 20 or so samples should not be used.

What's Next?

Starting with empirical samples to make estimates of future outcomes is call Estimating. Labeling it as No Estimates seems a bit odd at best.

With the basic understanding the empirical data is needed for any credible estimating process, look further into the principles and practices of probabilistic estimating for project work.

This, hopefully, will result in an understanding of sample size calculations to determine the confidence in the forecast as a start.

There was a post yesterday where the phrase embrace the intellectual honesty of uncertainty and a picture of Dice. I interpreted - possibly wrongly - that picture meant uncertainty is the same as tossing dice and gambling with your project.

While uncertainty is certainly part of project management, it's not gambling, it's not guessing. It's probability and statistics.

So when someone suggests that tossing dice is the same as embracing uncertainty ask a few questions:

Do you have a model of the underlying uncertainties of your project. The reducible and irreducible uncertainties?

Do you have reference classes for the past performance of the work you are planning to perform?

Do you have mitigation plans for the reducible uncertainties?

Do you have margin for the irreducible uncertainties?

Of the answer to these is NO, then you are in fact tossing the dice for your project's success.

Our mathematical models are far from complete, but they provide us with schemes that model reality with great precision - a precision enormously exceeding that of any description that is free of mathematics - Roger Penrose - "What is Reality, New Scientist, 2006

Any suggestion that all models are wrong, some models are useful, from a person who does not have George Box's book on the shelf and can point to the page that quote is on, or who has not read Penrose, is speaking about which he is uninformed. Do not listen.

I found this picture on the web. The OP didn't know where it came from so I have no attribution. It speaks volumes to the gap between knowing and doing. The notion of knowing and doing is at the heart of Engineered solutions to complex problems, versus created solutions. Of course engineered solutions are created - in that creativity is mandatory for the engineered solution to provide the needed capabilities the customer ordered. But the engineered aspects are the framework in which that creativity is performed.

But the inverse may not be true. Engineering is a broad term.

Engineering (from Latin ingenium, meaning "cleverness" and ingeniare, meaning "to contrive, devise") is the application of scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, research, and improve structures, machines, devices, systems, materials and processes.

Software Engineering is not always stand alone, but might be considered part of Systems Engineering. Software in many cases is embedded in a physical object used by people or other physical objects. Or embedded in a physical processes used by people. These objects are systems and maybe even System of Systems.

Systems Engineering is an interdisciplinary field of engineering that focuses on how to design and manage complex engineering projects over their life cycles. Issues such as reliability, logistics, coordination of different teams (requirements management), evaluation measurements, and other disciplines become more difficult when dealing with large or complex projects. Systems engineering deals with work-processes, optimization methods, and risk management tools in such projects. It overlaps technical and human-centered disciplines such as control engineering, industrial engineering, organizational studies, and project management. Systems engineering ensures that all likely aspects of a project or system are considered, and integrated into a whole.

Coding software, in the absence of Computer Science or Software Engineering, is not likely part of any engineering discipline, but is a skill of turns ideas into code.

Software Engineering is

The study and an application of engineering to the design, development, and maintenance of software. Typical formal definitions of software engineering are: the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software.

So If You're in the Engineered Solutions Business...

If you're in the engineered solutions business, systems engineering, software engineering, business process engineering, would the picture at the top be considered credible by your customers, your professors in college, your business development team, your board of directors?

The picture at the top says several things:

You've not likely done a project like this before. Why would someone give you money for you to learn now? (Remember we're engineering the solution, not conducting research).

You were not paying attention in that probability and statistics class in college (engineering or computer science) when the prof explained all elements of projects are random variables.

You've come to believe that the externalities of microeconomics of software development are not applicable to your project.

You're a sole proprietor with tons of money and don't really care how much it costs, when you'll be done, or what you get when you're done. You just want to produce something meaningful to you.

The picture at the botton says several things:

You have experience in the development of systems like this and know that uncertainty creates risk, and risk management is how adults manage projects.

You actually know how to apply probability and statistics, have used Bayesian forecasting - not just tossed the word around, but applied actual math - to forecast a probabilistic outcome from your past experiences.

You paid attention in that engineering principles class where they taught you how to move forward in the design and development process with incremental outcomes. Yes this incremental development is the basis of all good engineering practices. The Big Design Up Front straw man went away 40 years ago.

I advise my students to listen carefully the moment they decide to take no more mathematics courses. They might be able to hear the sound of closing doors - James Caballero. "Everyone is a Mathematician," CAIP Quarterly 1989.

When we fail to understand that all elements of projects are random variables that interact with each other in non-linear, non-stationary, and stochastic manner and that making decisions about projects requires assessing the impact on future outcomes from these random variables, we've confirmed we stopped listening too soon to those mathematics teachers and are now adrift in the sea of uncertainty with no reference point to know how to avoid spending our customers money without knowing how much, when we'll be done, and the most critical of all if we'll be able to earn back the Value in exchange for that money.

Just started a new book The Physics of Wall Street: Brief History of Predicting the Unpredictable. The seeds of decision making in the presence of uncertainty, started long ago with Louis Bachelier's A Theory of Speculation.

This work, started in 1892, and published in 1914, lays the groundwork for the mathematics of making choices in the presence of uncertainty.

March 29, 1900, is likely the day mathematical finance was born. On that day a French doctoral student, Louis Bachelier, successfully defended his thesis Theorie de la Speculation at the Sorbonne. The jury, noting the topic was far different from any of those considered by other candidates, appreciated its high degree of originality.

In the to the book left, with commentary and background, of Bachelier's seminal work is provided in English. The thesis is a remarkable document. In mathematical terms, Bachelier's achievement was to introduce many of the concepts of what is now known as stochastic analysis. His purpose of the thesis was to provide a theory for the valuation of financial options. He came up with a formula that is both correct on its own terms and surprisingly close to the Nobel Prize-winning solution to the option pricing problem by Fischer Black, Myron Scholes, and Robert Merton in 1973, the first decisive advance since 1900.

Those options theories, are the basis of Real-Options. RO are used in making decisions about future values for investments - many in the IT domain - based on probabilistic outcomes for valuation of capital budgeting decisions. A real option itself, is the right — but not the obligation — to undertake certain business initiatives, such as deferring, abandoning, expanding, staging, or contracting a capital investment project in the presence of uncertainty.

Many in the anti-estimates business may want to read about how making decisions in the presence of uncertainty is core to all business management, investment decisions, any decision about spending money when the outcome of that decision is probabilistic, based on the underlying statistical processes that drive these outcomes. So when we read, ask how can this actually be the case? Time to ask if the basis of mathematical decision making got suspended?

When I hear about a process, a procedure, a tool, a method, an idea, and even an anti-process like #NoEstimates, I first ask in what domain is this applicable? And when the answer comes back, software I suspect the speaker hasn't considered much outside his own domain.

A software-intensive system is any system where software contributes essential influences to the design, construction, deployment, and evolution of the system as a whole. [IEEE-Std-1471-2000]

It is observed in EU-NSF-SIS 2004 and Peter Freeman and David Hart. A science of design for software intensive systems. Commun. ACM, 47(8):19-21, 2004 (Dr. Freeman was a professor at UC Irvine, when I was a grad student in Physics, when we had to take classes outside out major).

Software has become a key feature of a rapidly growing range of products and services from all sectors of economic activity. Software-intensive systems include:

large-scale heterogeneous systems,

embedded systems for automotive applications,

telecommunications,

wireless ad hoc systems,

business applications with an emphasis on web services etc.

Our daily lives depend on complex software-intensive systems, from banking to communications to transportation to medicine.

Some Application Domains for SIS

Automotive

Characteristics

Hard real-time

Severe resource constraints (high cost pressure)

Highly interconnected

High reliability and safety requirements

User interface critical

Extreme increase in complexity (can be observed and is further expected)

Very long-winded certification procedures (even for small components)

Very conservative approach common

And more for these domains.

Avionics - my personal favorite

Space missions - my second personal favorite

Medicine technique - know a little about this

Industrial automation - not a lot about process control systems

Telecommunication - don't know much about this

So when we hear about the next big thing that is going to revolutionize the world of software development, in what domain is that actually going to happen, does the speaker have any experience in that world, is there any evidence that this next big thinghas been applied in that world with success, and where can we read about it?

There is a tendency, and I do this as well, to focus on our own little corner of the universe. I've learned to recognize when I'm doing this and have been taught by several good Systems Engineering leaders and a senior cost leader to have a touchstone to test ideas against.

A second touchstone are the components of project performance management. In this picture, cost, schedule, and delivered capabilities are independent variables coupled with non-linear dynamic connections. When someone says cost of delay, that's cost. When we hear features it capabilities. Time, deadlines, sequence of work, it's schedule.

Jorion (2007) wrote that “Western Europe conquered the world because of a technological revolution that started from the attempts to measure the world.” In the same way, attempts to measure risk - (and its related project performance impacts) - more definitively, realistically, and accurately will surely lead to better project management. - thanks to: "Here, There Be Dragons: Considering the Right Tail in Risk Management," Christan B. Smart, Missile Defense Agency, Redstone Arsenal, Alabama, Journal of Cost Analysis and Parametrics, 5:65–86, 2012.

Making estimates of cost, schedule, and technical performance outcomes and their impacts to programmatic and technical risk in the absence of long tails is venturing into waters where Dragons live.

In the insurance business - and other financial domains - conditional tail expectations is applied to mitigate risk.

When questioning to perform or not perform some method of managing a project, select from a variety of features, or make any decision involving cost, schedule, or performance, ask first what's the value at risk? The answer is the basis of your decision. And making decisions without estimating these impacts creates even more food for the Dragon.

My value at risk is $27,000. I should spend some amount of time making sure I'm on the right track to produce benefit from my 6 week 2 person, database integration project. But that time should be quick and provide some sense that my efforts will work. Maybe 30 minutes looking at the possible margin I need to raise the confidence in completing inside that 6 week period.

My value at risk is in excess of $2B for a nation wide health care enrollment systems used indirectly by all 50 states and directly by a large number of states. I'd better have a deep understanding of the long tail aspects of estimates for cost, schedule, and technical performance. So some type is modeling (Monte Carlo) connected with the Reference Classes I'm going to use to drive the Probability Distribution Functions for the model, a Risk Register containing the reducible and irreducible risks connected to my model, and a probabilistic critical path analysis of all the blocking factors for this project. With this I can start to understand the probabilities of success, but will need to do the analysis again every time we have a deliverable to make sure we're in track

And for all projects in between, ask what am I willing to lose if I'm seriously wrong about the cost and schedule? What should I invest to decrease the probability to an acceptable level that I'm wrong about my estimates? That's one of the basis of Value at Risk.

When you hear we can make decisions about future outcomes in the absence of estimates - think how tasty you'll be to that Dragon when he eats you alive.

This quote is from a DOD Contracting surveillance officer on the inability of some managers to use data for decision making.

Making good decisions requires good data. Data about the future. The confidence in that data starts with gathering good data. It then moves to understanding the naturally occurring and event based uncertainties in that data. With this understanding the decisions can then be based on risk informed, statistically adjusted impacts to cost, schedule, and technical performance for future outcomes.

No Data? Driving in the Dark with Your Headlights Off. Hoping you don't run off the road.

My favorite though is this one. Driving in the rear view mirror.

Confidence intervals are the means to measure population parameters. A concern in inferential statistics (making a prediction from a sample of data or from a model of that data) is the estimation of the population parameter from the sample statistic.

The sample statistic is calculated from the sampled data and the population parameter is estimated from this sample statistic.

Statistics are calculated - this means the data from we are looking at, the time series of values for example in a project are used in a calculation

Parameters are estimated - a parameters from these numbers is then estimated from the time series. This estimate has a confidence interval. From this estimate we can make inferences.

One issue in inference making - estimating - is sample size determination. How large of a sample do we to make an accurate estimation? This is why small sample sizes produce very unreliable inferences. For example sampling 27 stories in an agile project and making in inference about how the remaining stories are going to behave is Very sporty business.

To have a good estimator, that is to make good estimates from sampled or simulated data the estimator must be:

Unbiased - the expected value of the estimator must be equal to the mean of the parameter

Consistent - the value of the estimator approaches the value of the parameter as the sample size increases

Relatively Efficient - the estimator has the smallest variance of all estimators which could be used.

The point estimate differs from the population parameter due to the sampling error, since there is no way to know who close it is to the actual parameter. Because of this, statisticians give an interval estimate as a range of values used to estimate the parameter.

What's the cost of this project going to be when we're done with all our efforts, given we done some work so far?

The confidence interval is an interval estimate with a specific level of confidence. A level of confidence is the probability that the interval estimate will contain the parameter. The level of confidence is 1 — α. Where 1— α area lies within the confidence interval. The maximum error of the estimate, E, is ½ the width of the confidence interval.

The confidence interval for a symmetric distribution is the point estimate minus the maximum error of the estimate is less than the true population parameter which is less than the point estimate plus the maximum error of the estimate.

An Example from Actual Observations

While staying at the Yellowstone Lodge during the Millennium (year 2000), our kids got sick with some type of flu going around the lodge. My wife lay in bed, tending them all night long and passed the time recording data about Old Faithful erupting outside our bedroom window.

Eruptions is the duration of the eruption of Old Faithful and Waiting is the waiting time before the next eruption. There is a correlation between these pieces of data. This is due to the physical processes of expelling water at high temperature and the refilling processes of the caverns below the surface

If we use R as our analysis tool, we can get a sense of what is happening statistically with Old Faithful. (R code below)

> attach(faithful) # attach the data frame > eruption.lm = lm(eruptions ~ waiting)

Then we create a new data frame that set the waiting time value.

> newdata = data.frame(waiting=80)

We now apply the predict function and set the predictor variable in the newdata argument. We also set the interval type as "confidence", and use the default 0.95 confidence level.

> predict(eruption.lm, newdata, interval="confidence") fit lwr upr 1 4.1762 4.1048 4.2476 > detach(faithful) # clean up

We can see there is 95% confidence interval of the mean eruption duration for the waiting time of 80 minutes is between 4.1048 and 4.2476 minutes.

Now to a Project Example

In the graph below the black line to the left is the historical data from a parameter I want to estimate from it's past value. But I need an 80% confidence and a 95% confidence interval for the customer as to what values this parameter will take on in the future. We can see from the Time Series of the past value both the 80% confidence and the 95% confidence bands for the possible value the parameter can take on the future.

What Does TheMean?

It means two things:

When we say we have an 80% confidence that a parameter will assume to value, we need to know how that parameter behaved in the past.

When we hear that we are estimating the future from the past, we MUSTknow about the behaviours of those past values, the size of the population, and the same size, before we can determine the confidence in the possible future outcomes. Have an Average Value without this data is prettu much useless in our decision making process.

What Does This Really Mean?

Anyone suggesting we can make decisions about future outcomes in the presence of uncertainty and at the same time in the absence of estimating those outcomes is pretty much clueless about basic probability and statistics random processes.

Since all project variables - the statistical parameters - are random variables, driven by underlying process that we must estimateusing statistical process available in R and our High School Stats book.

Footnote

When it is mentioned I use bayesian statistics, or I use Real Options, ask if they are using something like the R Tutorial Resource with Bayesian Statistics. And of course the source code for the statistical processes described above. Then ask to see their data. There seems to be a lot of people tossing around words, like Bayesian, Real Options, Monte Carlo, and other buzz words without actually being able to show their work or the result that an be tested outside their personal ancedotes. Sad but true.

A sample of the nearly endless materials on how to apply Reference Class Forecasting

So when you here we can't possibly estimate this piece of software. It's never been done before. Look around a bit to see if Someone has done it, then look so more, maybe they have a source for a Reference Class you can use.

## Recent Comments