The NPR story about the Affordable Care Act web site is one of those out of your domain stories. (follow picture link to story) that we see in the popular press.
First is the way over generalized descriptions of Waterfall and Agile. Things like Waterfall development favors listing a huge set of requirements for a system up front, letting developers go away for months (if not longer) and expecting a huge software product in the end.
The simple minded description of agile is stated as The agile method does the opposite, favoring work done in phases, delivering "minimum shippable" parts of a software system in weekly or biweekly cycles. This allows for iterating — or adjusting to hiccups discovered in the previous cycle, changing features or quashing bugs quickly and avoiding getting an end product that doesn't look a thing like what your users need.
Let's get some clarity here. The conjecture that agile would have somehow safed the ACA program is just that pure conjecture.
But let's first look at some possible root causes (page 5 of the linked briefing):
- Evolving requirements - all requirement evolve in all large project. It's a core law of large projects. Managing requirements through a process is increasing maturity of delivered capabilities is the core concept of Integrated Master Plan / Integrated Master Schedule defined in DOD and NASA programs. Without a Plan for delivering capabilities no software development method is going to help. Without some measureable description of done development processes simply have not target to work toward. The core success factor of agile is to have direct contact with the customer. This would be highly unlikey in the case of the ACA site.
- Multiple definitions of success - this is a killer problem. Without knowing what done looks like in units of measure meaningful to the decision makers, nothing is going to help.
- Significant dependencies on external parties / contractors - this is the role of program management. Interface Control is the mechanisms to managing these dependencies. This may or may not have been in place.
- Parallel stacking of all phases - This alone can't be determined to be a root cause. Parallel work processes are not uncommon. Managing them is required.
- Insufficient time and scope of end-to-end testing - insufficient time to doing testing would be a critical failure mode. When agile or traditional are not provided sufficient time, it's gonna be ugly.
- Launch at full volume - full volume Go Live is usually a bad idea. Incremental release of capabilities. Shopping, Signing up for an account, actually buying something is the behaviour of Amazon, Barnes and Noble, Target, or most every retail site, including health insurance sites.
So let's look at page 15's corrective actions and their sources:
- Lock down requirements - all requirements need managing. Agile or traditional. All teams need to know what to build, what order to build it in, and what the otucomes of the work effort needs to be. Open ended requirement processes usually means open ended work and not definition of done.
- Implement new governance would seem to imply governance needed to be changed.
- Determining demand management is a core success factor of any enterprise system. Without this, the project has no understanding of what the performance will be when go live comes.
- Align on shared metrics for success. No alignment no measures of performance, measures of effectiveness, key performance parameters, technical performance measures. No anything needed to assess progress to plan.
- Lock down funding. A major source of disruptive behaviour on nearly every project is instability of funding.
- Communicate pivotal plan to SBM states - no communication? Really?
So What Does All Mean?
The five immutable principles, practices, and processes of project were violated.
- Where's the risk management plan? Risk retirement. Cost, schedule, and tecnhical performance margin for the irreducible uncertanties. Management Reserve or risk buy down for thje reducible risks.
- External testing processes - Red Team testing. Verification and validation of released and ready to go capabilities.
- System Architecture to assure all the external interfaces have the proper definition.
- User community hands on everything.