Welch’s t-Test: The Reliable Option to Compare 2 Population Means with Unequal Variances

-

Discover why Welch’s t-Test is the go-to method for accurate statistical comparison, even when variances differ.

Towards Data Science
Photo by Simon Maage on Unsplash

Part 1: Background

In the primary semester of my postgrad, I had the chance to take the course STAT7055: Introductory Statistics for Business and Finance. Throughout the course, I definitely felt a bit exhausted at times, but the quantity of information I gained in regards to the application of assorted statistical methods in several situations was truly priceless. In the course of the eighth week of lectures, something really interesting caught my attention, specifically the concept of Hypothesis Testing when comparing two populations. I discovered it fascinating to find out about how the approach differs based on whether the samples are independent or paired, in addition to what to do after we know or don’t know the population variance of the 2 populations, together with methods to conduct hypothesis testing for 2 proportions. Nonetheless, there’s one aspect that wasn’t covered in the fabric, and it keeps me wondering methods to tackle this particular scenario, which is performing Hypothesis Testing from two population means when the variances are unequal, often called the Welch t-Test.

To understand the concept of how the Welch t-Test is applied, we will explore a dataset for the instance case. Each stage of this process involves utilizing the dataset from real-world data.

Part 2: The Dataset

The dataset I’m using accommodates real-world data on World Agricultural Supply and Demand Estimates (WASDE) which might be often updated. The WASDE dataset is put together by the World Agricultural Outlook Board (WAOB). It’s a monthly report that gives annual predictions for various global regions and the US in the case of wheat, rice, coarse grains, oilseeds, and cotton. Moreover, the dataset also covers forecasts for sugar, meat, poultry, eggs, and milk in the US. It’s sourced from the Nasdaq website, and you might be welcome to access it without cost here: WASDE dataset. There are 3 datasets, but I only use the primary one, which is the Supply and Demand Data. Column definitions may be seen here:

Figure 1: Column Definitions by NASDAQ

I’m going to make use of two different samples from specific regions, commodities, and items to simplify the testing process. Moreover, we will likely be using the R Programming Language for the end-to-end procedure.

Now let’s do a correct data preparation:

library(dplyr)

# Read and preprocess the dataframe
wasde_data <- read.csv("wasde_data.csv") %>%
select(-min_value, -max_value, -year, -period) %>%
filter(item == "Production", commodity == "Wheat")

# Filter data for Argentina and Australia
wasde_argentina <- wasde_data %>%
filter(region == "Argentina") %>%
arrange(desc(report_month))

wasde_oz <- wasde_data %>%
filter(region == "Australia") %>%
arrange(desc(report_month))

I divided two samples into two different regions, namely Argentina and Australia. And the main target is production in wheat commodities.

Now we’re set. But wait..

Before delving further into the applying of the Welch t-Test, I can’t help but wonder why it’s crucial to check whether the 2 population variances are equal or not.

Part 3: Testing Equality of Variances

When conducting hypothesis testing to check two population means without knowledge of the population variances, it’s crucial to substantiate the equality of variances in an effort to select the suitable statistical test. If the variances become the identical, we go for the pooled variance t-test; otherwise, we will use Welch’s t-test. This necessary step guarantees the precision of the outcomes, since using an incorrect test could lead to fallacious conclusions as a consequence of higher risks of Type I and Type II errors. By checking for equality in variances, we ensure that that the hypothesis testing process relies on accurate assumptions, ultimately resulting in more dependable and valid conclusions.

Then how will we test the 2 population variances?

We’ve to generate two hypotheses as below:

Figure 2: null and alternative hypotheses for testing equality variances by creator

The rule of thumb may be very easy:

  1. If the test statistic falls into rejection region, then Reject H0 or Null Hypothesis.
  2. Otherwise, we Fail to Reject H0 or Null Hypothesis.

We are able to set the hypotheses like this:

# Hypotheses: Variance Comparison
h0_variance <- "Population variance of Wheat production in Argentina equals that in Australia"
h1_variance <- "Population variance of Wheat production in Argentina differs from that in Australia"

Now we must always do the test statistic. But how will we get this test statistic? we use F-Test.

An F-test is any statistical test used to check the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to find out if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions in regards to the error term.

Figure 3: Illustration Probability Density Function (PDF) of F Distribution by Wikipedia

we will generate the test statistic value with dividing two sample variances like this:

Figure 4: F test formula by creator

and the rejection region is:

Figure 5: Rejection Region of F test by creator

where n is the sample size and alpha is significance level. so when the F value falls into either of those rejection region, we reject null hypothesis.

but..

the trick is: The labeling of sample 1 and sample 2 is definitely random, so let’s ensure that to position the larger sample variance on top each time. This fashion, our F-statistic will consistently be greater than 1, and we just have to consult with the upper cut-off to reject H0 at significance level α each time.

we will do that by:

# Calculate sample variances
sample_var_argentina <- var(wasde_argentina$value)
sample_var_oz <- var(wasde_oz$value)

# Calculate F calculated value
f_calculated <- sample_var_argentina / sample_var_oz

we’ll use 5% significance level (0.05), so the choice rule is:

# Define significance level and degrees of freedom
alpha <- 0.05
alpha_half <- alpha / 2
n1 <- nrow(wasde_argentina)
n2 <- nrow(wasde_oz)
df1 <- n1 - 1
df2 <- n2 - 1

# Calculate critical F values
f_value_lower <- qf(alpha_half, df1, df2)
f_value_upper <- qf(1 - alpha_half, df1, df2)

# Variance comparison result
if (f_calculated > f_value_lower & f_calculated < f_value_upper) {
cat("Fail to Reject H0: ", h0_variance, "n")
equal_variances <- TRUE
} else {
cat("Reject H0: ", h1_variance, "n")
equal_variances <- FALSE
}

the result’s we reject Null Hypothesis at significance level of 5%, in other words, from this test we imagine the population variances from the 2 populations should not equal. Now we all know why we must always use Welch t-Test as an alternative of Pooled Variance t-Test.

Part 4: The primary course, Welch t-Test

The Welch t-test, also called Welch’s unequal variances t-test, is a statistical method used for comparing the technique of two separate samples. As a substitute of assuming equal variances like the usual pooled variance t-test, the Welch t-test is more robust because it doesn’t make this assumption. This adjustment in degrees of freedom results in a more precise evaluation of the difference between the 2 sample means. By not assuming equal variances, the Welch t-test offers a more dependable consequence when working with real-world data where this assumption is probably not true. It’s preferred for its adaptability and dependability, ensuring that conclusions drawn from statistical analyses remain valid even when the equal variances assumption shouldn’t be met.

The test statistic formula is:

Figure 6: test statistic formula of Welch t-Test by creator

where:

and the Degree of Freedom may be defined like this:

Figure 7: Degree of Freedom formula by creator

The rejection region for the Welch t-test relies on the chosen significance level and whether the test is one-tailed or two-tailed.

Two-tailed test: The null hypothesis is rejected if absolutely the value of the test statistic |t| is larger than the critical value from the t-distribution with ν degrees of freedom at α/2.

One-tailed test: The null hypothesis is rejected if the test statistic t is larger than the critical value from the t-distribution with ν degrees of freedom at α for an upper-tailed test, or if t is lower than the negative critical value for a lower-tailed test.

  • Upper-tailed test: t > tα,ν
  • Lower-tailed test: t < −tα,ν

So let’s do one example with One-tailed Welch t-Test.

lets generate the hypotheses:

h0_mean <- "Population mean of Wheat production in Argentina equals that in Australia"
h1_mean <- "Population mean of Wheat production in Argentina is larger than that in Australia"

this can be a Upper Tailed Test, so the rejection region is: t > tα,ν

and through the use of the formula given above, and through the use of same significance level (0.05):

# Calculate sample means
sample_mean_argentina <- mean(wasde_argentina$value)
sample_mean_oz <- mean(wasde_oz$value)

# Welch's t-test (unequal variances)
s1 <- sample_var_argentina
s2 <- sample_var_oz
t_calculated <- (sample_mean_argentina - sample_mean_oz) / sqrt(s1/n1 + s2/n2)
df <- (s1/n1 + s2/n2)^2 / ((s1^2/(n1^2 * (n1-1))) + (s2^2/(n2^2 * (n2-1))))
t_value <- qt(1 - alpha, df)

# Mean comparison result
if (t_calculated > t_value) {
cat("Reject H0: ", h1_mean, "n")
} else {
cat("Fail to Reject H0: ", h0_mean, "n")
}

the result’s we Fail to Reject H0 at significance level of 5%, then Population mean of Wheat production in Argentina equals that in Australia.

That’s methods to conduct Welch t-Test. Now your turn. Joyful experimenting!

Part 5: Conclusion

When comparing two population means during hypothesis testing, it is admittedly necessary to start out by checking if the variances are equal. This initial step is crucial because it helps in deciding which statistical test to make use of, guaranteeing precise and dependable outcomes. If it seems that the variances are indeed equal, you’ll be able to go ahead and apply the usual t-test with pooled variances. Nonetheless, in cases where the variances should not equal, it is suggested to go along with Welch’s t-test.

Welch’s t-test provides a powerful solution for comparing means when the idea of equal variances doesn’t hold true. By adjusting the degrees of freedom to accommodate for the uneven variances, Welch’s t-test gives a more precise and dependable evaluation of the statistical importance of the difference between two sample means. This adaptability makes it a preferred alternative in various practical situations where sample sizes and variances can vary significantly.

In conclusion, checking for equality of variances and utilizing Welch’s t-test when needed ensures the accuracy of hypothesis testing. This approach reduces the possibilities of Type I and Type II errors, leading to more reliable conclusions. By choosing the suitable test based on the equality of variances, we will confidently analyze the findings and make well-informed decisions grounded on empirical evidence.

Resources

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x