Perhaps the most common question people ask when they learn about probability is “What is the definition of statistical averages?” This is an important question because there is a wide variety of statistical averages out there, and knowing what they mean is essential for analyzing data accurately and confidently.
This article will discuss the various statistical averages that you might encounter, what they mean, and how to interpret them.
Mean
The first and most basic statistical average is the mean. The mean is simply the arithmetic average of a set of numbers. For example, if you have four numbers, then their average is ((4 × 1) + (4 × 2) + (4 × 3) + (4 × 4)) / 4 = 4.5. The + refers to addition, and the / refers to division, meaning that you are taking the average of all the numbers (in this case, 4) and dividing it by the number of numbers you started with (in this case, 4).
This means that if you are ever given the mean as an approximation for the actual value, then you can be sure that the approximation is accurate to within half a unit.
An approximation of 4.5 for the mean is 4.5 ± 0.5, where the ± sign means that the value could be slightly higher or lower than the mean, but it will never be more than half a unit higher or lower. This is very handy because it means you can always round down to the nearest integer if you are given the approximation instead of the exact value.
Median
The median is the middle value in a set of numbers. It is usually the value that is closest to half of the numbers in the set. For example, if you have the set of numbers {1, 2, 3, 4, 5, 6, 7, 8, 9}, then the median is 5. This means that the 5 is the value that is closest to half of the numbers in the set ({1, 2, 3, 4, 5, 6, 7, 8, 9}), or in other words, the 5 is the middle value in the set. As you can see, this is quite different from the mean, which simply averages all the numbers together to get a single value.
Since the median is the middle value in a set, when you are given the median as an approximation for the actual value, you know that there is at least one value in the set that is higher than the median and one that is lower than the median. Another handy property of the median is that, unlike most other statistical averages, it does not have to be rounded down to the nearest integer if you are given the approximation instead of the exact value.
Mode
The mode is the value that appears the most frequently in a set of numbers. For example, if you have the set of numbers {a, b, c, d, e, f, g, h, i}, then the mode is g, which means that the g appears most frequently in the set ({a, b, c, d, e, f, g, h, i}).
As you can see, this can be quite a difficult value to estimate because it is not always obvious which value appears most frequently in a set. One way of finding the mode is to sort the numbers in ascending order and look for the value that occurs most often. Another way is to use the count frequency analysis, where you simply add up all the numbers in the set and divide the total by the number of elements in the set. There are various other methods as well.
Standard Deviation
The standard deviation is a measurement of the variability (or dispersion) around the mean. It is often used to estimate how close the mean is to the actual value. The standard deviation can range from negative infinity to positive infinity, and it is usually expressed in units of the standard deviation itself, e.g., µM for micro molarity or km for kilometer. That is, you have a standard deviation of 10 µM, which means that the numbers are scattered around the mean by 10 µM. This is roughly the same as saying the numbers are scattered within a range of ±5 µM around the mean.
When you are given a standard deviation as an approximation for the mean, it is usual to round down to the nearest integer to get a more accurate estimation. This is because you cannot tell how far away the mean is just by looking at the standard deviation. For example, a standard deviation of 10 µM rounded down to the nearest integer is 9 µM. This means that the mean is very close to the actual value but you cannot tell exactly where it is, in other words, you cannot estimate with any degree of accuracy.
Significance Level
The significance level is the probability of incorrectly rejecting the hypothesis that the sample data is randomly drawn from a population. It is often expressed in units of probability, e.g., 10% for 0.1 or 1 in 100,000 chance, etc. The significance level helps determine how “significant” an observation is. The problem with this measurement is that it is difficult to determine what exactly counts as a “rejection” of the hypothesis that the data is random. This could mean that the data contradicts the hypothesis, it proves the hypothesis, or it shows that the hypothesis cannot be rejected. In short, the significance level is not an exact measurement and it is always represented as a range between zero and one, inclusive. So, when you see a significance level of 0.1, this does not mean that the hypothesis is rejected with 10% certainty; it could mean that it is rejected with a 1% certainty or that it is not rejected with a 0.1% certainty.
Confidence Interval
The confidence interval is the range between the lower and upper limits of a set of numbers, where the lower limit is known as the “critical value” and the upper limit is known as the “confidence value.” The confidence interval is typically represented as two numbers with an “±” sign in between. For example, let us say that you have randomly produced a set of numbers with an average of 3.5 and a standard deviation of 0.5. If you wish to know the lower limit of the confidence interval, you would say that you have a 60% chance of the actual average being less than or equal to 2.75. This would be expressed as 2.75 ± 0.5. In this case, the critical value (lower limit) is 2.75 and the confidence value (upper limit) is 4.25.
The confidence interval helps determine the minimum sample size that you would need in order to conclude that the data is representative of a population. If, for example, you have a 60% chance of the average being less than or equal to 2.75 and you wish to claim that the data represents a population with an average of 3.5 and a standard deviation of 0.5, then you would need to sample 20 numbers and expect that, on average, they will be less than or equal to 2.75. This is because if you sample 19 numbers and one of them is higher than 4.25, then you would have to conclude that the data does not represent the population, since the average would then have to be higher than 4.25 to be within the set. Similarly, if you sample 17 numbers and two of them are less than 2.75, then you would have to conclude that the data represents a population with an average of 3.5 and a standard deviation of 0.5 because two out of the 20 numbers you sampled cannot be less than 2.75.
T-test For Two Means
The t-test for two means is a two-tailed test used to determine whether or not there is a difference between the mean of two samples. To perform this test, you simply compute the difference between the first sample mean and the second sample mean, and then divide this difference by the standard deviation of each sample, in order to get a T-score. This T-score is then compared to the t-critical value (or “critical T-score”) for your degrees of freedom. If the T-score lies between the critical value and the standard deviation, then you can be sure that the two sample means are significantly different from each other. This is a binary decision: either the two sample means are significantly different from each other, or they are not (that is, the T-score does not lie between the critical value and the standard deviation).