Let’s begin by dropping the ultimate takeaway of this blog:

“The odds ratio is a consistent measure for both the Population & Sample statistics (Case Control studies with significant effect) where the Risk ratio shows inconsistency”

Now the question that comes to our mind is — Why is that?

I will prove this using a relatable example so that we can build an intuitive sense around it as well. But before jumping to the example, let’s see how Risk & Odds behave:

Appreciate the fact that there must be some loophole in the R-squared measure because of which adjusted R-squared measure was introduced. However, it sometimes gets misinterpreted and people apply the same intuition for this as for normal R-squared measure which is incorrect. Before we arrive at the ugly mathematical expression of adjusted R-squared, we need to go through various terminologies and their purpose of existence like SST(Sum of squares — Total), SSR(Sum of squares — Regression), SSE(Sum of squares — Error) & finally DOF(Degrees of Freedom). …

I am a firm believer that visual interpretation of concepts has a long-lasting retention period than the conventional approach. I assure you when you finish up reading this blog, you will establish a crystal clear understanding of Type-I & Type-II error. Let’s begin with some preamble about Hypothesis Testing:

Any existing belief/default position/status quo/general statement relating to a population parameter which someone is trying to reconfirm/disprove based on the principles of inferential statistics is called Hypothesis testing.

It has two components:

·

Null Hypothesis(H0)— Existing belief/default position/status quo/general statement about a population parameter·

Alternate Hypothesis(Ha)— Something contradictory to the Null…

Before directly jumping to the topic, let us see the symbols that are used for population statistics and sample statistics respectively:

What is ‘Coefficient of Variation’ and why do we need it as a statistical measure?

By far(till my previous blogs) we are summarizing the datasets based on a 2-number summary — *Central Tendency and Spread*. I want to probe you to think what if we want to compare the volatility of two or more data sets, is it feasible to do that right now with 2-number summary measures?

The answer is we can’t compare them without bringing the spread measure on the comparable scale (Standardize). …

If someone gave you the 1-number summary (central tendency) of the below shown five datasets, in your mind, you would have thought they all are the same since their means are the same but when you plot each data point of each set and compare them visually you shall realize that there should exist a measure to detect this distinguishing pattern as well.

A crucial theorem that sits in the background of inferential statistics, without which inferential statistics has no relevance. It assists in hypothesis testing, regression analysis, and other analytical techniques.

Yes I am talking about **“THE CENTRAL LIMIT THEOREM”**

It states no matter how the population distribution is, as the sample size (n) of the samples derived(with replacement) from the population increases, the distribution of the sample mean or sum approaches a normal distribution.

Numerically, means of random samples drawn from any population distribution with mean µ and standard deviation σ will have an approximately normal distribution with a mean equal to µ and standard deviation of σ /√n. …

To represent a dataset as a 1-number summary, we use central tendency measure. There exist three central tendency measures i.e. Mean, Median & Mode. Why was there a need for these three measures when only one (Mean) could have done the job? This is what this blog is all about, as this blog ends you will be able to answer the notorious question — Which one to choose & when? Since each one of them has its own pros and cons, the same will be elaborated to establish conceptual clarity.

Let’s begin with the visual representations to better interpret the…

Knowledge of basics of statistics has always been and will be of umpteen importance in the data science domain. Also, if someone wants to delve into the buzz streams of the future like Machine Learning, Deep Learning and Artificial Intelligence, solid foundations of statistics are a must. Partial knowledge of statistics is not only harmful but its application is even worse — ‘a disaster’. Keep a watch on my upcoming/posted blogs where slowly and steadily we will move ahead in this journey of learning by questioning the need of plethora of statistical measures and justifying with relatable examples.

Let’s begin this journey with statistics…

*If you will ask Mother Nature — What is her favorite probability distribution?*

The answer will be — ‘Normal’ and the reason behind it is the existence of chance/random causes that influence every known variable on earth. What if a process is under the influence of assignable/significant causes as well? This is surely going to modify the shape of the distribution (distort) and that’s when we need a measure like skewness to capture it. Below is a normal distribution visual, also known as a bell curve. It is a symmetrical graph with all measures of central tendency in the middle.

But what if we encounter an asymmetrical distribution, how do we detect the extent of asymmetry? …