Computing the mean of a list of values is relatively simple. The mean is the sum of the values divided by the number of values in the list. Since the statistical formula is so closely related to the actual loop, we'll provide the formula, followed by an overview of the code.
The definition of the Σ mathematical operator leads us to the following method for computing the mean:
Procedure 15.1. Computing Mean
intialize sum, s, to zero
for each value, i, in the range 0 to the number of values in the list, n:
add element xi to s
return s divided by the number of elements, n
The standard deviation can be done a few ways, but we'll use the formula shown below. This computes a deviation measurement as the square of the difference between each sample and the mean. The sum of these measurements is then divided by the number of values times the number of degrees of freedom to get a standardized deviation measurement. Again, the formula summarizes the loop, so we'll show the formula followed by an overview of the code.
The definition of the Σ mathematical operator leads us to the following method for computing the standard deviation:
Procedure 15.2. Computing Standard Deviation
compute the mean, m
intialize sum, s, to zero
for each value, xi in the list:
compute the difference from the mean, d as xi - m
add d2 to s. This is typically done as d*d in Java, since there is no “squared” operator. In Python, this can be d**2.
compute the variance, v, as s divided by (n - 1). This n - 1 value reflects the statistical notion of “degrees of freedom”, which is beyond the scope of this book.
return the square root of the variance, v.
For programmers new to Java, the
java.lang.Math contains the
Math.sqrt, which computes square roots.
For programmers new to Python, the math
module contains the math.sqrt, which computes
square roots.
Programmers who have the Python numeric
module installed will find that this module provides some
alternatives to the designs shown in this section. Since this module
is optional, we won't cover it any any depth here. However, it can
simplify the methods for computing mean and standard
deviation.