In most projects, most of the time we are taking samples and using inferential statistics to infer population parameters. Why? Because to take the whole population would be impractical, too costly or too time consuming. This of course means that there is a risk that our parameters do not represent the whole population. The question this raises is how big is the risk, or expressed another way, what real confidence do we have in the parameters we have calculated being representative of the whole population?

For example, we want to find the average height of men aged 40 in the United Kingdom, so we take a sample of 50 people at random and work out the mean and standard deviation. How likely is it that our calculated parameters are the same as the whole population? If they are not then how close are they likely to be?

The answers come from confidence intervals (CI’s). Confidence is the basis for Risk Based Decision Making.

Consider the example of a Belt asked by his Operation Director to improve a process. The Belt makes a change and the Director then asks for some data to confirm that the process performance has improved as he wants to role the same idea out across 300 other similar processes. The Belt takes 20 samples and works out a mean performance and standard deviation and it appears that there has been a slight improvement in performance. What would be the recommendation? What are the risks involved in making a decision at this point?

Understanding the level of risk and uncertainty involved when making conclusions & recommendations is a fundamental part of CI’s. CI’s are useful whenever you want to understand the range of values within which a population parameter is expected to fall, together with a level of risk of getting it wrong. Take the process improvement example above, if the Belt could calculate the likely range of values within which the true mean would fall, and state how confident he was in that result, then the decision making process is a lot more clear.

A definition of a CI is a range within which the true population parameter is expected to fall with repeated sampling (Mean, Median, Sigma, Proportion etc.). We quote a range (minimum and maximum) and the level of confidence we have in that range actually containing the true parameter value. The range of the CI depends on:

· Variability observed in the sample – the more variable the less confidence we have so the greater the range quoted

· The sample size – the more samples we take, the more confidence we have, so the smaller the range quoted

· The level of confidence required, the more certainty we need the greater the range quoted.

A confidence interval is the range of values in which we have confidence that it contains the true population parameter. So, a 95% CI suggests that approximately 95 out of 100 confidence intervals will contain the true population parameter. Bear in mind that we rarely know the true population parameter, which is why confidence is expressed in this way.

Most of the time, we calculate 95% confidence intervals, but any value of CI can be quoted. 99% or 90% are also quite common. The higher the confidence we want that our values quoted include the true parameter, the larger the range of values we need to quote.

## CALCULATING CONFIDENCE INTERVALS

It is relatively easy to calculate CI’s...