There is a popular fallacy in the Agile community that small batch sizes are Risk Management. What small batches do is provide information that informs risk management at a fast rate. Small batch sizes answer the question...
How long are you willing to wait before you find out you are late?
But there is a larger principle at work here
The sampling rate of a dynamic process under control, or the sampling rate of a signal from a process you want to model defines the fidelity of the sampled signal. The sampling rate is defined by the Nyquist Frequency. The Nyquist Sampling Theorem is a fundamental bridge between continuous-time signals (analog signals) and discrete-time signals (digital signals). It establishes a sufficient condition for a sample rate that permits a discrete sequence of samples to capture all the information from a continuous-time signal of finite bandwidth.
This analog signal represents the underlying dynamics of the software development process. The dynamics of the cost, schedule, and technical performance resulting from the execution of the project. The digital signal is the sampled information of the analog signal needed to determine the physical percent complete of the dynamic analog software development process.
To answer the how long question posted above, a sample rate at ½ the signal change rate is needed. So if you want to know if you're late on a 2-week sprint journey, sampling at 1-week intervals is the minimum sample rate to control that signal. On our Software Intensive System of Systems, we sample the project every Thursday after with an assessment of Physical Percent Complete.
Below is an example of the data held by and produced by Rally. The units of measure here are in Hours, but they could be Story Points or any other measure that is Ordinal. This approach is a closed loop control system, where the estimates work (in your choice of ordinal units) is the starting point. As the work progresses the TO DO value goes from the original estimate to the new estimate to complete. This is a feedback loop down at a daily basis at the Sprint Stand-up. It's a simple question that can be answered by the team. What is your updated estimate to complete the work, you've been working on, now that you've been working on it? This closes the loop with empirical data on a daily basis, from the physical assessment of the work performed.
As the work progresses the Physical Percent Complete (P%C) progresses as well. When a Stories work is determined to be more than planned, the TO DO increases, showing a reduction in the P%C. This provides daily reporting of P%C, so direct corrective or preventive actions can be taken by the Team to increase the Probability of Success of the Stories and Tasks in the Sprint.
The Punch Line
The closed-loop control, shown below, of the planned work with the assessment of P%C, does NOT reducible the risk by itself. The details of closed-loop control for software development can be found in a post Closed Loop Control.
Since all risk comes from Reducible (Epistemic) uncertainty and Irreducible (Aleatory) uncertainty. Specific actions must be taken to manage the risk.
The risk created by Epistemic Uncertainty represents resolvable knowledge, with elements expressed as the probabilistic uncertainty of a future value related to a loss in a future period of time. Awareness of this lack of knowledge provides the opportunity to reduce this uncertainty through direct corrective or preventive actions.
Epistemic uncertainty, and the risk it creates, is modeled by defining the probability that the risk will occur, the time frame in which that probability is active, and the probability of an impact or consequence from the risk when it does occur, and finally, the probability of the residual risk when the handing of that risk has been applied.
Epistemic uncertainty statements define and model these event-based risks:
- If‒Then ‒ if we miss our next milestone then the project will fail to achieve its business value during the next quarter.
- Condition‒Concern ‒ our subcontractor has not provided enough information for us to status the schedule, and our concern is the schedule is slipping and we do not know it.
- Condition‒Event‒Consequence ‒ our status shows there are some tasks behind schedule, so we could miss our milestone, and the project will fail to achieve its business value in the next quarter.
For these types of risks, we need an explicit or an implicit risk handling plan. The word handling is used with special purpose. "We Handle risks" in a variety of ways. Mitigation is one of those ways. However, in order to mitigate the risk, we must introduce new effort (work) into the schedule. We are buying down the risk, or we are retiring the risk by spending money and/or consuming time to reduce the probability of the risk occurring. Or we could be spending money and consuming time to reduce the impact of the risk when it does occur. In both cases, we are taking action to address the risk.
Aleatory uncertainty and the risk it creates comes NOT from the lack of information, but from the naturally occurring processes of the system. For aleatory uncertainty, we cannot buy more information nor take specific risk reduction actions to reduce the uncertainty and resulting risk. The objective of identifying and managing aleatory uncertainty to be preparing to handle the impacts when risk is realized.
Schedule Margin should be used to cover the naturally occurring variances in how long it takes to do the work. Cost Margin is held to cover the naturally occurring variances in the price of something we are consuming in our project. The technical margin is intended to cover the naturally occurring variation of technical products.
Aleatory uncertainty and the resulting risk is modeled with a Probability Distribution Function (PDF) that describes the possible values the process can take and the probability of each value. The PDF for the possible durations for the work in the project can be determined. It turns out we can buy knowledge about aleatory uncertainty through Reference Class Forecasting and past performance modeling. This new information then allows us to update ‒ adjust ‒ our past performance on similar work will provide information about our future performance. But the underlying processes is still random, and our new information simply created a new aleatory uncertainty PDF.
The first step in handling Irreducible Uncertainty is the creation of Margin. Schedule margin, Cost margin, Technical Margin, to protect the program from the risk of irreducible uncertainty. The margin is defined as the allowance in the budget, projected schedule … to account for uncertainties and risks.
The small batch size provides samples of Physical Percent Complete intervals needed - according to Nyquist Theorem - to test the control loop and produce an error signal.
Small Batch Sizes are NOT risk management, they contribute to risk management.