Trending is a critical tool in forecasting the future performance of a project. "We've be doing this well for awhile now, what's going to happen in the future?" is a very common question for project management to ask of the "project controls" staff.
Two indices in Earned Value are the Cost Performance Index (CPI) and Schedule Performance Index (SPI). They are calculated from the Earned Value Management numbers of BCWP (Budgeted Cost of Work Performed), BCWS (Budgeted Cost of Work Scheduled) and ACWP (Actual Cost of Work Performed). This post is not about Earned Value, but about statistical analysis of Earned Value numbers. You can find good tutorials on EVM in several places, including A Gentle Introduction to Earned Value.
What is critical is the forecast for where the project is going with the information from the past. In the standard approach static information is used to compute a forecast of the future. This information is a cumulative measure of CPI/SPI for all past periods and the current period of performance. Together these are used in a linear formula to forecast future performance - again in a linear, non-probabilistic manner.
All of this has lead me to think about the probabilistic processes needed to improve the credibility of project performance measures. I discovered through our BioChem son, the programming system R. R is a functional programming language and system for statistics and probability.
Along with R the US DOD has rolled out an XML format for capturing the monthly Integrated Performance Management Report. The time series of CPI and SPI are available monthly in electronic readable form. With this data forecasting using R and the myriad of tools can be done.
Using a simple functional command line,
will produce a nice picture of the time series data with forecast value at defined confidence levels.
The next step is to do this on actual project performance data (using SPI/CPI) and see what can be reveiled about future performance.
The problem to be solved is simple:
- The current Earned Value Management reporting hides all the variances at the work package and control account level through the cumulative reporting. The data is there, it is just not reported.
- The current period data also cumulates the variances from the work packages and then adds that data to the past reporting.
- There is not assessment of the statistical nature of the underlying data, so the result is a scalar, linear report of a underlying stochastic process, hiding all the real drivers by ignoring the variances and summing to the top. In Darrell Huff's How to Lie with Statistics, this is a recommended practice.
- There is no adjustment of the future forecast for the past probabilistic behaviour. Let alone risk, technical performance, measures of effectiveness, and compliance with Key Performance Parameters (but JROC and program). There is just a linear projection of the future from the rolled up past, hiding the variances, and ignoring all the excursions that created those variances.
Mathematically this means that what drove the variances in the past is assumed to no longer be driving the future. Those sources of variance are washed out. This may be the case from corrective actions for the Epistemic uncertainties (uncertainty due to lack of knowledge). But the Aleatory uncertainties (uncertainty due to randomness or luck) that drive the variances are not subject to corrective actions. They are baked into the system and no amount of new knowledge or corrective effort is going to make them go away.
This is simply bad data analysis in any discipline I've ever worked, except program controls.
Background Materials
This idea started with Eric Ducker's paper and presentation at ICEA titled, "Performing Statistcal Analysis on Earned Value Data."There are some other sources as well: