Project work is random. Most everything in the world is random. The weather, commuter traffic, productivity of writing and testing code. Few things actually take as long as they are planned. Cost is less random, but there are variances in the cost of labor, the availability of labor. Mechanical devices have variances as well.
The exact fit of a water pump on a Toyota Camry is not the same for each pump. There is a tolerance in the mounting holes, the volume of water pumped. This is a variance in the technical performance.
Managing in the presence of these uncertainties is part of good project management. But there are two distinct paradigms of managing in the presence of these uncertainties.
- We have empirical data of the variances. We have samples of the hole positions and sizes of the water pump mounting plate for the last 10,000 pumps that were installed. We have samples of how long it took to write a piece of code and the attributes of the code that are correlated to that duration. We have empirical measures.
- We have a theoretical model of the water pump in the form of a 3D CAD model with the materials modeling for expansion, drilling errors of the holes and other static and dynamic variances. We have modeling the duration of work using a Probability Distribution Function and a Three Point estimate of the Most Likely Duration, the Pessimistic and Optimistic duration. These can be derived form past performance, but we don't have enough actual data to produce the PDF and have a low enough Sample Error for our needs.
In the first case we have empirical data. In he second case we don't. There are two approaches to modeling what the system will do in terms of cost and schedule outcomes.
Bootstrapping the Empirical Data
With samples of past performance and the proper statistical assessment of those samples, we can re-sample them to produce a model of future performance. This bootstrap resampling shares the principle of the second method - Monte Carlo Simulation - but with several important differences.
- The researcher - and we are researching what the possible outcomes might be from our model - does not know nor have any control of the Probability Distribution Function that generated the past sample. You take what you got.
- As well we don;'t have any understanding of Why those samples appear as they do. They're just there. We get what we get.
- This last piece is critical because it prevents us from defining what performance must be in place to meet some future goal. We can't tell what performance we need because we have not model of the need performance, just samples from the past.
- This results from the statistical conditions that there is a PDF for the process that ius unobserved. All we have is a few samples of this process.
- With these few samples, we're going to resample them to produce a modeled outcome. This resampling locks in any behavior of the future using the samples from the past, which may or may not actually represent the true underlying behavior. This may be all we can do because we don't have any theoretical model of the process.
This bootstrapping method is quick, easy, and produces a quick and easy result. But it has issues that must be acknowledged.
- There is a fundamental assumption that the past empirical samples represent the future. That is the samples contained in the bootstrapped list and their resampling ae also contained in all the future samples.
- Said in a more formal way
- If the sample of data we have from the past is a reasonable representation of the underlying population of all samples from the work process, then the distribution of parameter estimates produced from the bootstrap model on a series of resampled data sets will provide a good approximation of the distribution of that statistics in the population.
- With this sample data and its parameters (statistical moments) we can make a good approximation of the future.
- There are some important statistical behaviors though that must be considered, starting with the future samples are identical to the statistical behaviors of the past samples.
- Nothing is going to change in the future
- The past and the future are identical statistically
- In the project domain that is very unlikely
- With all these condition, for a small project, with few if any interdependencies, a static work process with little valance, boot strapping is a nice quick and dirty approach to forecasting (estimating the future) based on the past.
Monte Carlo Simulation
This approach is more general and removes many of the restriction to the statistical confidence of bootstrapping.
Just as a reminder, in principle both the parametric and the non-parametric bootstrap are special cases of Monte Carlo simulations used for a very specific purpose: estimate some characteristics of the sampling distribution. But like all principles, in practice there are larger differences when modeling project behaviors.
In the more general approach of Monte Carlo Simulation the algorithm repeatedly creating random data in some way, performing some modeling with that random data, and collecting some result.
- The duration of a set independent tasks
- The probabilistic completion date of a series of tasks connected in a network (schedule), each with a different Probability Distribution Function evolving as the project moves into the future.
- A probabilistic cost correlated with the probabilistic schedule model. This is called the Joint Confidence Level. Both cost and schedule are random variance with time evolving changes in their respective PDFs.
In practice when we hear Monte Carlo simulation we are talking about a theoretical investigation, e.g. creating random data with no empirical content - or from reference classes - used to investigate whether an estimator can represent known characteristics of this random data, while the (parametric) bootstrap refers to an empirical estimation and is not necessary a model of the underlying processes, just a small sample of observations independent from the actual processes that generated that data.
The key advantage of MCS is we don't necessarily need past empirical data. MCS can be used to advantage if we do, but we don't need it for the Monte Carlo Simulation algorithm to work.
This approach could be used to estimate some outcome, like in the bootstrap, but also to theoretically investigate some general characteristic of an statistical estimator (cost, schedule, technical performance) which is difficult to derive from empirical data.
MCS removes the road block heard in many critiques of estimating - we don't have any past data on which to estimate. No problem, build a model of the work, the dependencies between that work, and assign statistical parameters to the individual or collected PDFs and run the MCS to see what comes out.
This approach has several critical advantages:
- The first is a restatement - we don't need empirical data, although it will add value to the modeling process.
- This is the primary purpose of Reference Classes
- They are the raw material for defining possible future behaviors form the past
- We can make judgement of what he future will be like, or most importantly what the future MUST be like to meet or goals, run the simulation and determine is our planned work will produce a desired result.
So Here's the Killer Difference
Bootstrapping models make several key assumptions, which may not be true in general. So they must be tested before accepting any of the outcomes.
- The future is like the past.
- The statistical parameters are static - they don't evolve with time. That is the future is like the past, an unlikely prospect on any non-trivial project.
- The sampled data is identical to the population data both in the past and in the future.
Monte Carlo Simulation models provide key value that bootstrapping can't.
- Different Probability Distribution Function can be assigned to work as it progresses through time
- The shape of that PDF can be defined from past performance, or defined from the needed performance.
The critical difference between Bootstrapping and Monte Carlo Simulation is MCS can show what the future performance has to be to stay on schedule (within variance), on cost, and have the technical performance meet the needs of the stakeholder.
Bootstrapping can only show what the future will be like if it like the past, not what it must be like. In Bootstrapping this future MUST be like the past. In MCS we can tune the PDFs to show what performance has to be to manage to that plan. Bootstrapping is reporting yesterday's weather as tomorrow's weather - just like Steve Martin in LA Story. If tomorrow's weather turns out not to be like yesterday's weather, you gonna get wet.
MCS can forecast tomorrows weather, by assigning PDFs to future activities that are different than past activities, then we can make any needed changes in that future model to alter the weather to meet or needs. This is in fact how weather forecasts are made - with much more sophisticated models of course here at the National Center for Atmospheric Research in Boulder, CO
This forecasting (estimating the future state) of possible outcomes and the alternation of those outcomes through management actions to change dependencies, add or remove resources, provide alternatives to the plan (on ramps and off maps of technology for example), buy down risk, apply management reserve, assess impacts of rescoping the project, etc. etc. etc. is what project management is all about.
Bootstrapping is necessary but far from sufficient for any non-trivial project to show up on of before the need date (with schedule reserve), at o below the budgeted cost (with cost reserve) and have the produce or service provide the needed capabilities (technical performance reserve).
Here's an example of that probabilistic forecast of project performance from a MCS (Risky Project). This picture shows the probability for cost, finish date, and duration. But it is built on time evolving PDFs assigned to each activity in a network of dependent tasks, which models the work stream needed to complete as planned.
When that future work stream is changed to meet new requirements, unfavorable past performance and the needed corrective actions, or changes in any or all of the underlying random variables, the MCS can show us the expected impact on key parameters of the project so management in intervention can take place - since Project Management is a verb.
The connection between the Bootstrap and Monte Carlo simulation of a statistic is simple.
Both are based on repetitive sampling and then direct examination of the results.
But there are significant differences between the methods (hence the difference in names and algorithms). Bootstrapping uses the original, initial sample as the population from which to resample. Monte Carlo Simulation uses a data generation process, with known values of the parameters of the Probability Distribution Function. The common algorithm for MCS is Lurie-Goldberg. Monte Carlo is used to test that the results of the estimators produce desired outcomes on the project. And if not, allow the modeler and her management to change those estimators and then mange to the changed plan.
Bootstrap can be used to estimate the variability of a statistic and the shape of its sampling distribution from past data. Assuming the future is like the past, make forecasts of throughput, completion and other project variables.
In the end the primary differences (and again the reason for the name differences) is Bootstrapping is based on unknown distributions. Sampling and assessing the shape of the distribution in Bootstrapping adds no value to the outcomes. Monte Carlo is based on known or defined distributions usually from Reference Classes.