The Statistical Error Debate

A couple of weeks ago I was teaching a few of my friends about a chi-square test for administration of a drug and improvement. The cause and effect of administering and choosing not to. The discussion drifted off to the consequences of the same. This is a question which has been debated by many statisticians over the ages. My friends ended up in the very same debate in very lay man terms.

Without much knowledge of statistics, we ended up talking about the risks of an error in medication(which you may not get over saying “oops”). Which obviously leads to the famous debate about the consequences of Type I and Type II errors and which is more dangerous. The following weeks, the debate continued in the classrooms, with faculty giving their view on the subject. So, what are these errors and which one is more dangerous? Let us examine these further

What are Statistical errors?

To understand statistical errors, we first need to understand what a null hypothesis is. A null hypothesis is a statement that there is no relationship between two measured phenomena, or no association among groups. Rejecting a null hypothesis(which in most experiments is desirable) concludes that there is a relation. The null hypothesis is generally assumed to be true until evidence indicates otherwise.Another important point to note about the null hypothesis is that it is never accepted in case the data concludes that there is no relation. We say instead that we “fail to reject” the null hypothesis, meaning that we really can’t conclude that the null hypothesis is entirely wrong, but the alternate is mostly true for the time being.

Now coming to errors, there are two types:

Type I error: We commit a Type 1 error if we reject the null hypothesis when it is true. This is a false positive, like a fire alarm that rings when there’s no fire.

Type II error: Type 2 error happens if we fail to reject the null when it is not true. This is a false negative—like an alarm that fails to sound when there is a fire.

While creating statistical models or performing hypothesis testing we often come across a confusion matrix as below

Confusion Matrix

The most desirable result for us are obviously the True Positives and True Negatives, however many times we have to deal with False Negatives(Type II) and False Positives(Type I).

In case of the fire alarm probably, neither of the errors is dangerous as you may detect the fire by other means. However imagine such an error happening in case of a biopsy for cancer. You wouldn’t be sure if you should give chemotherapy.

So which is more dangerous? The correct answer is “It depends”. As an MBA, we are used to “It depends”, many times to take a diplomatic stance, but in this case, it really depends on the context.

The right thing to do is to understand the fallouts of committing an error. There really is no rule of thumb for this. Let us look at examples for these:

In case of the earlier mentioned cancer example, a type I error would say that there is a chance of cancer, and further test might reveal the truth, at the risk of some anxiety to the patient, however a type II error would give a false assurance to the patient of not having cancer while it is actually killing him by the day.

Now imagine the case of a court trial. A type I error is if a person is found guilty of a crime he did not commit, which would be injustice to the defendant. However a type II error would mean the court has found a person not guilty, in spite of having committed a crime. This is a great outcome for the defendant, however puts society at risk.

Similar risks can be thought of for Business and improvements in models can be done in order to minimize risk and eliminating the most critical errors

Please follow and like us:

Leave a Reply

Notify of