Descriptive Statistics For Data Science Beginners 1. Mean 2. Mode 3. Median

Artificial Intelligence

Descriptive Statistics For Data Science Beginners 1. Mean 2. Mode 3. Median

admin

June 27, 2023

Descriptive Statistics For Data Science Beginners
1. Mean
2. Mode
3. Median

Descriptive Statistics plays crucial role in describing and summarizing fundamental features and characteristics of an information set . where data science and machine learning are essentially modern version for statistics . It provides the crucial tools and techniques to investigate and interpret data, resulting in beneficial insights and actionable outcomes .

Listed here are some key concepts which are typically covered in descriptive statistics for data science:

1.

central tendencies are most vital in statistics because this are used to calculate middle or most occurred data point this itself give the entire idea of knowledge set.

used to grasp the info

used to attract conclusions from data

without this we lose beneficial insights

central tendency might be measured using

Mean
Mode
Median

mean might be obtained by sum of observations by total variety of observations.

mean for ungrouped data:

mean for ungrouped data

mean for grouped data:

sum of product of remark(xi) and the corresponding frequencies(fi) divided by sum of all frequencies .

kinds of mean:

I) Arithmetic mean

The arithmetic mean, also often called the typical, is a measure of central tendency that represents the everyday value of a dataset. It’s calculated by summing all of the values within the dataset and dividing the sum by the overall variety of values.

Arithmetic mean

II) Geometric mean

The geometric mean is a measure of central tendency that represents the everyday value of a dataset, particularly for values which are related to exponential growth or ratios. It’s calculated by taking the nth root of the product of all of the values within the dataset, where n is the overall variety of values.

III). Harmonic mean

It’s calculated by taking the reciprocal of every value within the dataset, finding their arithmetic mean, after which taking the reciprocal of that mean.

IV). Trimmed mean

mostly used to remove certain percentage of utmost values from each ends of the info and calculate the mean of remaining values . so it reduces influence of outliers.

p= percent of observations from each ends

most occurred value in the info set.

I)Mode(ungrouped):

it might be higher to rearrange the info values either in ascending or descending order, in order that we will easily find the repeated values and their frequency. Hence, the remark with the very best frequency might be the mode of the given data. Alternatively, we will form a frequency distribution table to get the mode.

II)Mode(grouped data)

For ungrouped data, the mode represents the worth or values that occur most steadily within the dataset. Here’s how you’ll find the mode for ungrouped data:

1. Arrange the info in ascending order to discover any repeated values.

2. Determine the worth or values that occur with the very best frequency. These are the modes.

– If there may be a single value that appears most steadily, it’s a unimodal dataset, and that value is the mode.
– If multiple values have the identical highest frequency, the dataset is multimodal, and all of the values with the very best frequency are considered modes.
– If there isn’t a value that repeats or all values have the identical frequency, the dataset has no mode or might be considered amodal.

Keep in mind that a dataset can have one mode, multiple modes, or no mode in any respect.

The median is a measure of central tendency that represents the center value in a dataset when it’s arranged in ascending or descending order. Here’s how you’ll find the median:

I) median for ungrouped data

When calculating the median for a dataset, the approach differs depending on whether the variety of observations (n) is even or odd:

1. Median for n = Odd:
— If the variety of observations is odd, the median is the center value of the dataset when it’s arranged in ascending or descending order.
— For instance, if you’ve got the dataset [3, 6, 8, 11, 14], the center value is 8, so the median is 8.

2. Median for n = Even:
— If the variety of observations is even, the median is the typical of the 2 middle values of the dataset.
— For instance, if you’ve got the dataset [2, 5, 7, 9], the 2 middle values are 5 and seven. The median is the typical of those two values: (5 + 7) / 2 = 6.

In summary, when the variety of observations is odd, the median is the center value. When the variety of observations is even, the median is the typical of the 2 middle values. This distinction is crucial to accurately calculate the median depending on the parity of the dataset size.

The median is commonly used as a measure of central tendency when the dataset has extreme values or shouldn’t be normally distributed. It’s less affected by outliers in comparison with the mean and provides a greater representation of the “typical” value in such cases.

Calculating the median for grouped data involves a rather different approach in comparison with ungrouped data. Here’s how you’ll find the

II) median for grouped data:

1. Determine the cumulative frequencies: Start by calculating the cumulative frequencies for every group. This involves adding up the frequencies of all previous groups, including the frequency of the present group. The cumulative frequency represents the overall variety of observations as much as that group.

2. Discover the median group: Find the group that comprises the median value. That is the group where the cumulative frequency exceeds or equals half of the overall variety of observations.

3. Calculate the median: When you’ve identified the median group, use the next formula to calculate the median:
— Median = L + ((n/2 — CF) / f) × w
— L: Lower boundary of the median group
— n: Total variety of observations
— CF: Cumulative frequency as much as the group before the median group
— f: Frequency of the median group
— w: Width of the group (class interval)

The formula calculates the precise value of the median by considering the lower boundary of the median group, the cumulative frequency as much as the previous group, the frequency of the median group, the width of the group, and the overall variety of observations.

Note: In some cases, interpolation could also be required to estimate the median more accurately if the group boundaries aren’t evenly distributed or if the median lies inside a selected range of values within the median group.

“ NOTE: Next Blog might be Continuous on Topic Descriptive Statistics (Measures of Dispersion)”