The usual algorithms for computing variance and standard deviation work on the full data set. What if you have a time series and want the standard deviation for a moving window? You could do the computation from fresh every time the window is advanced, but surely there’s a better way.

Let’s denote the data by \(x_0, x_1, \ldots\) and see how the statistics change when we slide a window of size N by one position, from \((x_0, \ldots, x_{N-1})\) to \((x_1, \ldots, x_N)\).

The mean is easy:

$$

\bar{x}_1 – \bar{x}_0 = \frac{\sum_{i=1}^N x_i – \sum_{i=0}^{N-1} x_i}{N} = \frac{x_n – x_0}{N}

$$

The standard deviation is a little tougher. The variance, which the standard deviation squared, is nicer for algebraic manipulations. Since the variance has an N-1 term in the denominator let’s have a look at what happens when computing \((N-1)s^2\).

$$

\begin{align}

&(N-1)s_1^2 – (N-1)s_0^2 \\

&= \left(\sum_{i=1}^N x_i^2-N \bar{x}_1^2\right)-\left(\sum_{i=0}^{N-1} x_i^2-N\bar{x}_0^2\right) \\

&= x_N^2 – x_0^2 – N (\bar{x}_1^2 – \bar{x}_0^2) \\

&= x_N^2 – x_0^2 – N (\bar{x}_1 – \bar{x}_0) (\bar{x}_1 + \bar{x}_0) \\

&= (x_N – x_0)(x_N + x_0) – (x_N – x_0) (\bar{x}_1 + \bar{x}_0) \\

&= (x_N – x_0)(x_N – \bar{x}_1 + x_0 – \bar{x}_0) \\

\end{align}

$$

The update rule turns out to be remarkably simple. Here’s a possible implementation of these moving window statistics in Python:

class RollingStatistic(object): def __init__(self, window_size, average, variance): self.N = window_size self.average = average self.variance = variance self.stddev = sqrt(variance) def update(new, old): oldavg = self.average newavg = oldavg + (new - old)/self.N self.average = newavg self.variance += (new-old)*(new-newavg+old-oldavg)/(self.N-1) self.stddev = sqrt(variance)

#### Why not use the formula for variance directly

Starting with this equivalent definition of variance, we see that the sum of squares is a part of the formula of variance.

$$

s^2 = \frac{\sum_{i=1}^N x_i^2 – N\bar{x}^2}{N-1}

$$

We could do a rolling update of the sum of squares and of the mean separately. The problem with this approach is that when the variance is small compared to the mean the subtraction suffers of catastrophic cancellation, the same problem that prompts us to use Welford’s method for one-pass variance computation.