The Danger of Relying on “Average”

By Gbolahan A. Salahudeen

When I first took a statistics course in school, I didn’t understand most of the cryptic language the lecturer was speaking. With that, lecturers who had no business with being in a classroom were the ones teaching the course. Unintuitive and boring. Most students, including me, thought they were the problem.

Their inability to understand what was being taught was chalked off to their inherent deficiency. I’m not a numbers’ person is the usual platitudinous justification, however untrue—to cover the ugly inadequacy. The stuffiness and density of the class didn’t help, either. So there were many seemingly insurmountable interlocking variables at play.

Despite my deficiency, I managed to parry the looming failure intimidating me with the “bombastic side eye” in the first semester of my 200L. Sadly, my luck ran out in the second semester and I failed woefully—the only time I failed a course throughout my four years in university.

I knew I had to put my destiny into my own hands if I was going to pass the re-sit. So, I enlisted the help of a guy in the business administration department who was a bit good at statistics.

(As of that time, I had no idea that statistics would later become my favorite subject. My foray into data analytics has made me fall deeply in love with it.)

***The Danger of Relying on Average. This image was generated with AI***

The guy started explaining the topics one by one. Afterward, I understood that it wasn’t as difficult as I had brainwashed myself into believing. During the course of teaching me descriptive statistics, this guy explained the mechanics of mean, median, and mode. The three constitute what is known as the “measure of central tendency.” They are single values that describe the middle or center of a dataset.

Although the mean is usually—and wrongly so—considered as the average, there are different types of average: mean, median, and mode. The mean of a dataset is obtained by summing individual values and dividing it by the total number of values. The median is the middle of a dataset when arranged and sorted in ascending order. The mode is the value with highest frequency in a dataset.

During one of those tutorials in my tutor’s dimly lit room, something immediately struck me and animated my curiosity: the concept of mean—or average, if you will—never made sense to me. Out of curiosity, I asked the guy if the “average” portrays reality in the context of income. Using a vivid and relatable example, I intuited that adding Bill Gates to a group of people effectively affects their average income.

The guy saw some merits in my argument and agreed with my perspective on it. Basking in the ecstasy of my statistical debate victory, I concluded that average isn’t always the most accurate metric. Why? The mean/average is sensitive to outliers. That is, it is influenced by extreme values, especially when working with asymmetrical (skewed) data.

Say for instance we have data detailing five employees’ salaries: 150, 165, 180, 182, 1,000. The average salary here is 335.4. That doesn’t reflect the reality of employees (in this example) who earn less than 200. In fact, 80% of them earn less than 200—way below the average salary.

Yet, the average salary is 335.4. In other words, the figure is grossly misleading and swallows low-income earners in this category, because of a single person who earns almost 10x what the majority earn. These people have unfortunately been marginalized.

If you decide to pursue a career in an industry because the average salary makes your mouth water and your eyes pop with the kind of delight that comes with watching Lionel Messi embarrassing other people’s fathers on the pitch with his outstanding skills, you can rein in your emotions and ask yourself if you are focusing on the right average.

This is important in income data—they are not symmetrical. Income data are usually positively skewed and come with a long right tail. That is to say, we always have a handful of people earning significantly more than the majority. Their extreme figures—like the example I gave on Bill Gates—skew the data toward the right, pulling the values upward and creating a false impression.

If you consider the fictitious dataset I’m using in this essay, the presence of an employee with 1,000 has skewed the data, which effectively gives a specious average figure. That’s the danger of using average as a metric. It is heavily influenced by extreme values (and outliers) and glosses over critical details.

So the next time you see something like “The average salary is 450,” pause and ask yourself, “What is missing in this data?” You might dig deeper and ask for the outlier—they have a lot to teach you about the data.

Standard deviation—a measure of how the data clusters around the mean—also helps. To explain this better, the standard deviation explains how each individual value is closer to the mean. For instance, we have the following individual values in the hypothetical data: 150, 165, 180. None of the values is even 200.

For a dataset with an average value of 335.4, that shows high variability. That is, most individual values are very far from the average value. In other words, individual values in the data, when compared, are not consistent. They vary significantly. Compare that with a standard deviation of 10.

A standard deviation of 10, in this case, will imply a tight cluster around the mean, i.e., most of the values are not far apart; they are very close to each other. And this also shows the absence of an outlier.

This doesn’t mean you should build a wall around your mind anytime you come across any statistic that collapses all details into the average figure. The average is very useful and can offer a quick insight into data distribution.

**The Danger of Relying on Average. This image was generated by AI**

But be careful; don’t take averages at face value because they can be so unreliable and specious. However, when data is normally distributed, especially data about heights and weights, the average figure can be reliable. Therefore, unless you know that the data is indeed normally distributed, don’t believe average values. Ask more questions and seek what the data is not showing.

In scenarios where the data is skewed by extreme values, median could be a better measure of central tendency. But be careful when using the median as the measure of central tendency—they also don’t tell the whole story.

For instance, renowned evolutionary biologist Stephen Jay Gould was once diagnosed with a rare kind of cancer with a median eight-month survival rate, indicating that half of those who had been previously diagnosed with the same type of cancer in the past died within eight months.

The remaining half lived much longer. Gould himself lived for another twenty years, which made it possible for him to publish an essay titled “The Median isn’t the Message,” before dying from another type of cancer.

By focusing on the median, the data didn’t tell much about 50% of the patients who survived the cancer. That’s what averages do: they oversimplify things and obscure important details that can guide crucial decisions.

So far, this essay has explored how the average can be misleading and how different types of average sketch different stories of reality. None is better than the other, and how they are used can either inspire hope or make you give it up.

Next time you are tempted to take what “data” says at face value, always remember this witty (although a bit prurient and suggestive) remark from Aaron Levenstein: “Statistics are like a bikini: what they reveal is interesting; what they conceal is important.” This is paramount because many decisions, from medical prognosis, advocacy to policy implementations, are made based on statistical data analysis.

Yet, when the underlying mechanics of these figures are not always clearly understood, at least to everyday people—because some of them can be unapologetically deceptive, never mind that some people intentionally twist and interpret data in a way that advances their self-serving interests—we risk implementing wrong policies and wasting resources that are insufficient in the first place

Gbolahan A. Salahudeen is a data analytics professional and writer who transforms complex statistical concepts into accessible insights for everyday readers. Having overcome his initial struggles with statistics, he now champions data literacy through engaging storytelling.

Find Me On

Trending News

National

National

National

Opinions

Local News

The Danger of Relying on “Average”

Leave a Reply Cancel reply