There are endless discussions about what went wrong with the Affordable Care Act web site development and deployment. It'll be hard to tell at this early point in the project assessment. But what is clear is this was most likely a failure of project management.
Below is the acquisition life cycle for Business systems in the DOD, not that HHS is a DOD-style shop, but the paradigm of iterative and incremental development is in place. The release cycles shown here are way too long for something like the ACA Site. But the topology of the process is sound.
Looking at this process there is an obvious starting point. The Business Capability Definition. What is the resulting system supposed to do in terms of capabilities? Not the technical and operational requirements, but what business capabilities will the system provide to the stakeholder when it is in full operation? This is called Initial Operating Capability (IOC).
In our domain, we start defining the capabilities using the Defense Acquisition Guide. Here is where Measures of Effectiveness (MoE) are defined. The Measure of Effectiveness is assigned to a capability. If we want a capability, how effective does it have to be? This measure is not a technical performance or a requirement. It is an effectiveness measure.
An MoE for a UAV program we work would be The UAV shall be transportable within a 3,000-mile radius via a C-17/C-141/C-5 package. From the MoE there is a Measure of Performance (MoP). For example, weight is an MoP that enables the MoE to be fulfilled. Lower down are Technical Performance Measures (TPM). For example, the weight of an Electro-Optical / Infrared sensor platform must be under 55 pounds for the UAV to operate properly. It can't be too light or it would disrupt the center of gravity and can't be too heavy because the UAV would burn too much fuel to accomplish its the mission.
So for the ACA site, we'd need to know if there were MoE's, MoP's, TPM's defined that enable the Capabilities to be delivered. Here's the Performance Reference Model for federal IT.
Since the ACA site is pretty much all software, I'm going to suggest that this approach of using Capabilities Based Planning, MoE's, MoP's, TPM's has nothing to do with how the software is built. Either traditional or agile methods can be used. Agile is likely faster, but agile can only work in a domain like this if you know what DONE looks like in terms of MoE's, MoP's, and TPM's. This is a fixed launch date, fixed set of requirements guided by all the insurance regulations, and hopefully some not to exceed the budget.
Here's an example of stating Capabilities for an integrated sensor platform program and how these capabilities are phased over the acquisition lifecycle.
It is a common myth that government acquisition is waterfall and big design up front. DoD 5000.02 prescribes an iterative process designed to assess the viability of technologies while simultaneously refining user requirements. (pg 16 of 5000.02).
One starting question of the ACA Site would be - did they apply the iterative acquisition process in some form, no matter the fidelity of the iterations?
Here are some other fundamental questions as well:
- Were there exit criteria for each iteration that allowed the stakeholders to proceed without having to revisit the developed capabilities?
- Were there tangible measures of physical percent complete connected to the needed capabilities, flowed down to the technical and operational requirements and then to the packages of work that produced products or services that implemented the requirements?
- Were there specific measures of increasing maturity for all the deliverables?
- Was there a technical and operational risk management plan connected to the tangible deliverables? Dis this plan implement a Continuous Risk Management process per SEI guidelines.
- Did the duration and cost baseline plan have a probabilistic risk model based on Reference Class Forecasting from past performance in the same or similar development domain? The site is not rocket science, just look to Amazon, Target, any other insurance site - Progressive or my favorite USAA, for examples.
If the answer to any of these is no or we don't know, go find out, get project managers who can do this. Otherwise, the probability of project success is reduced. Look at the Probability of Program Success literature for further guidance.
The final question is did they have an Integrated Master Plan and Integrated Master Schedule for all the work as described in the Integrated Master Plan and Integrated Master Schedule Preparation and Use Guide? This paradigm has been shown to significantly increase the probability of success not matter the domain, context, development method, technology, or business process. It states in clear, concise, and unequivocal terms what DONE looks like at every point in the project in units of measure meaningful to the decision-makers.
The final - and killer question is - did the project team ruthlessly manage the changes to the capabilities? This is suspect is the root cause of the failure. Late changes to complex projects are the kiss of death.
As repeated often here...
Don't do stupid things on purpose
So Now What?
We have to wait to see what the Root Cause Analysis (RCA) shows for the failure of the project. But I'd conjecture the program management processes found in large DoD or NASA programs where not applied in any meaningful way. The site is not larger compared to most of the programs we work ($400M is small), but the processes used to manage those programs can be scaled down with ease. Principles are the same. The Practices are scalable and the Processes scalable as well.