There is a discussion on the LinkedIn #ProjectControls forum. It started out with a simple question of what does PERTMaster do? PERTMaster is one of many tools that provides results from a schedule, showing the probability of completing on or before a date and a confidence of that forecast, using Monte Carlo Simulation. The Monte Carlo method is used in assessing project schedules, but more importantly it is used to modeling a wide variety of physical processes. The term Monte Carlo was coined in 1940, but the real algorithmwas developed at the casino's in Monte Carlo to determine how much money to have on hand for a fair roulette wheel. This was the origin of 00 on the wheel.
Since then the notion of analyzing the performance of a schedule and its cost using random numbers for the duration and assembling a Histogram of the completion times to determine the probability of completing on or before a date has been baked into the government Integrated Master Schedule processes, through DI-MGMT-81650.
Let's start with a core problem statement
"Despite Using All the Metrics Commonly Employed to Measure Cost, Schedule, Performance and Program Risk, There are Still Too Many Surprises (Poorly Performing
or Failing Programs) Being Briefed “Real Time” to Army Senior Leadership"
Is a quote from John Higbee around the problem of using Monte Carlo and leaving it at that. There are several important understandings needed before the MCS tools can be used:
- Your schedule has to be credible. By credible I mean a schedule that is actually executable. By executable I mean a schedule that passes the smell test of how to Increase the Probability of Success.
- By credible it also means that the network of activities that make up the schedule have themselves credibility. One place to looks (actually several) is
Now with a credible schedule, comes the resource loading of that schedule and of course the Earned Value Management of the work to forecast future performance. This work because the probability distributions for the activity durations in the schedule - when connected in a well-formed network - can be modeled to produce a picture of the confidence of completing on or before a planned date. Along with comes cost models if the schedule is resource loaded and Other Direct Costs are added.
This approach is mandated for Earned Value Management programs through DI-MGMT-81650 and the latest integration of the Schedule Risk Assessment (SRA) in DI-MGMT-81466B. Both require Schedule Risk Assessment, which is usually done with Monte Carlo Simulation. Method of Moments can be used, but I don't know how to make that work. I have colleagues that do.
But Here's the Real Problem
Having a Monte Carlo Simulation of your schedule and let's say a possibly credible model of the costs - By The Way most MCS tools build models of cost from the resource loaded hours, not the material or ODCs.
First, the MCS is for duration (and maybe cost) "risk." The term "risk" means the variability of the durations of the work (and maybe their cost). But there are lots of other risks besides duration risks. Technical risks, operational risks, political risks, environmental risks. All of which can have unfavorable impacts on cost and schedule. So a credible model of the actual Probability of Success for a project needs to include not only the variances in the activity durations and their resulting costs - from labor hours and maybe ODCs, but also impacts on those hours from other risks.
This problem has several layers:
- What are the risks? Can you name them, analyze them, categorize them, come up with a handling plan for them, monitor them and essentially retire them.
- How are they related? Are they all independent? Meaning each risk lives alone in its own little world of probability of occurrence and probability of impact, with cost and schedule impacts?
- How do they drive the actual variances in cost and schedule?
BTW, there are many words flying around here, and no real independent dictionary. Many authors use the same words for different things, so care is needed, and I'll try to provide the definitions as I go.
An accessible book on the topic of risk management is Mike Clayton's Risk Happens! It's easy to read has all the facts right and can be put to work.
The next step is to determine where all this stuff lives on the program. From my experience it lives in the Systems Engineering domain. These types of discussions (other than the actually scheduling and cost) belong to the System Engineers. This is because the desgin and construction of the thing is directly connected to the ability to actually make the thing on-time, on-schedule. And to do that we need to have a credible model of not only the thing but how the thing is made, the cost, schedule, processes, tooling, facilities, testing, validation, etc. etc.
This is called PLANNING and the Plan is the Integrated Master Plan.
So What Does All This Mean and Who Actually Cares?
It means that simple Monte Carlo Simulation of the Integrated Master Schedule is absolutely necessary but it is far from sufficient to assess the Probability of Program Success (PoPS). What this means is we must discover how all these moving parts interact with each other, how they influence each other, how a small distruption in one place can cause big trouble some where else.
Risk Management is Systems Engineering and that means we a Managing a System. One of the best populist books (I know I'm not a recommender of populist books, but this one is a must read) is John Gall's Systems Bible. The best Gall quote is on the cover of my Master's Thesis
In a complex system, malfunction and even total nonfunction may not be detected for long periods, if ever.
This is the case for programmatic systems as well. Programmatic systems are everything about cost, schedule, technical performance and risk.
So the first thing to do in a Programmatic System is to determine the topology of the system. What are its pieces, how are they connected, what are the dependencies (or couplings), what is the strength of these couplings, and what are the dynamics of the resulting system.
These couplings can be “interdependent” or “coupled” work activities and their probabilistic attributes, including risks associated with those “coupled” tasks. These form a model of the program – the probabilistic aspects of each task, cost, schedule, probability of technical performance; physical performance failure, as well as process failures and their impacts in the system or subsystem failure in any dimension. These couplings have several classes of dependency
- A binding dependency represents the dependency which is regarded as a constraint between two dependent tasks.
- A non-binding dependency represents the dependency which is not regarded as a constraint between two dependent tasks.
- A critical binding dependency is a binding dependency between the tasks with zero level slack.
- A non-critical binding dependency is a binding dependency which is not critical binding. Following critical binding dependencies along the tasks, a critical dependency path goes through the entire process hierarchies of a project from the first level to the last level. This path is different from a critical path because it is determined without considering time aspects. However, this path provides guidance for process improvements in the early stage of planning processes when detailed data for durations are not available or the probabilistic models fail to provide sufficient confidence intervals for decision making.
These are all assembled into a model of the program. A wonderful tool for modeling all these things that make up the Programmatic Architecture is a Design Struture Matrix (DSM). Here's a notion DSM of a project. This could be a real project, so let's use it. This is taken from the handbook we use for this approach.
I discovered DSM through Tyson Browning, while he was at Lockheed Martin Aero in DFW. There is a whole discipline for DSM. Search for Dr. Browning's work and example of how DSM is integrated into the Program Controls activities of Boeing, BMW, GE Aircraft Engines, and dozens of other firms who understand that Programmatic Architecture and the Program Controls discipline is Systems Engineering.
One irony is the 2012 DMS conference is titled "Managing Complexity by Modeling Dependencies
DSM is one tool that can be applied here. But DSM fits into a broader range of tools around System Dynamics. Since all the programmatic aspects are embedded in the broader system aspects of the program, we can start to see that simple cost and schedule modeling in the absence of dependencies, in the absence of risk events which can be called causal risks or casual events all need to be integrated with the Master Plan, the Master Schedule, the physical system taxonomy and topology, and of course all the categories of risk.
This let us start to answer the question - What is the Probability of Program Success?