**STATISTICS**

**Statistics**

**is concerned with scientific methods for collecting, organizing, summarizing, presenting and analysing data as well as withdrawing valid conclusions and making reasonable decisions on the basis of such analysis.**

**Data:**Data is collection of related facts, observations or figures. A collection of data is data set and each observation a data point .

**Variables :**

**Discrete and Continuous :**

A variable is a
symbol, such as X,Y ,H, x or y that can any of a prescribed set of values
called the domain of the variable. If the variable can assume only one value,
it is called a constant.

A variable that can assume any value between
the two given values is called a continuous variable, otherwise, it is called
discrete variable.

**Example :**The number N of children in a family , which can assume any of the values 0,1,2,3 but cannot be 2.5 or 3.842 is a discrete variable.

The height H of an
individual, which can be 67 inches, 65.5 inches or 66.234, depending on the
accuracy of measurement, is a continuous variable.

**Tabulation**: The arrangement of the raw data under various heads in the form of a table is called tabulation.

**Frequency**

**:**Number of observations falling in a particular class is called the frequency of that class

**Cumulative frequency**: The cumulative frequency of a particular class is the sum of the frequencies of this class and those prior to it.

**Frequency Distribution Table :**The table showing the class intervals along with the corresponding class frequencies is know as “

**Frequency Distribution Table**“.

**Length of the class**: The difference between the upper and lower boundaries is called the Length of the Class

**Mid value**: The average of lower and upper limit is called “ mid value “ of a class interval .

**Measures of Central Tendency:**

It
gives us an idea about the concentration of the values in the central part of
the distribution. An average of statistical series is the value of the variable
which is representative of entire distributes

**Types of Central Tendency:**

1 Mathematical Average.

a)
Arithmetic Mean

b) Geometric Mean

c) Harmonic Mean.

2
. Positional Averages.

a)
Median

b)
Mode

**Arithmetic mean :**

**i)**Individual Series : A.M of ungrouped date =

Sum of the items/Number of Items=

**ii)**Discrete Series : If X

_{1, }X

_{2}, ……X

_{n}are n distinct values with frequencies f

_{1}, f

_{2},f

_{3}, …….f

_{n}then

**iii)**Continuous Series: If X

_{1},X

_{2},X

_{3}…..X

_{n}are the mid values and f

_{1},f

_{2},f

_{3},….f

_{n}are frequencies of a grouped data then

In the step deviation method A.M= where A= Assumed mean, C= width of the class
d=

**Important Results**:

(i)
The algebraic sum of
deviations taken about mean is zero.

(ii)
For any two numbers a and
b, then their mean is (a+b)/2

(iii)
Every data set has only
one mean.

**Merits of Arithmetic Average :**

(i)
Easy to understand and
easy to calculate

(ii)
If Provides good basis for
comparison.

(iii)
As every item is taken in calculation,
it is affected by ever item

**Demerits of Arithmetic Average :**

(i)
It cannot be located
graphically

(ii)
A single observation can
bring , big change in the mean

(iii)
It is very difficult to
find the actual mean

(iv)
We cannot calculate the
mean for a data set with open ended classes.

**Weighted Arithmetic Mean**:

The
weighted mean is calculated taking into account the relative importance of each
of the values to the total value

When the observations X

_{1}, X_{2}, X_{3},……X_{n}and the weights W_{1}, W_{2}, W_{3},…..W_{n}are given to each observation, then weighted Arithmetic mean is given by**Combined Mean:**If X1 and X2 are arithmetic means of two series with m and n observations respectively, the combined mean is

**Geometric Mean:**

It
is useful when we have some quantities that change over a period of time.

**(i)**Ungrouped data (individual Series)

G.M= = where X

_{1}, X_{2}, X_{3},……X_{n}are n observations.**(ii)**Discrete Series:

G.M. = where f

_{1},f_{2},f_{3},….f_{n}are frequencies and X_{1}, X_{2}, X_{3},……X_{n}are n observations and is the sum of the observations.**Properties of the Geometric Mean.**

i)
G.M. is used in
calculating the growth rates.

ii)
If any observation is
zero, G.M becomes zero.

iii)
It is difficult to
calculate the nth root.

iv)
If a and b are two numbers
then their G.M. is .

v)
If any observation is
negative, G.M. is imaginary

**Harmonic Mean:**

Harmonic
mean of a given series is the reciprocal of the arithmetic average of the
reciprocals of the values of its various observations.

i)
Ungrouped Data (Individual
Series): Let X

_{1}, X_{2}, X_{3},……X_{n}be n observations, then their H.M. =
ii)
Discrete Series:

H. M. =

Where n is the sum of the observations and
f

_{1},f_{2},f_{3},….f_{n}are frequencies of the observations X_{1}, X_{2}, X_{3},……X_{n}respectively**Properties of H.M.**

(i)
H.M. useful to calculate
speed and distance.

(ii)
If a and b are two
numbers, their H.M. is .

**MEDIAN:**

The median as the name suggests, is the middle
value of a series arranged in any order of magnitude.

For
Ungrouped Data:

(i)
If n is odd, th observation is the median, after arranging
the observations either ascending order of descending order.

(ii)
If n is even, then the
average of the middle two observations is the median, after arranging the
observations either in ascending or descending order.

For Grouped Data:

Median
=

Where
L= Lower limit of median class

N= Sum of the frequencies

M=The cumulative frequency before the
median class

F=frequency of the median class

C=length of the class

**Properties of Median**:

**(i)**Median is easy to understand and it can be computed from any kind of data even for grouped data with open-ended classes, but excluding the case when median falls in the open-ended class.

**(ii)**Median can also be calculated for qualitative data

**(iii)**The sum of absolute deviations taken about median is least

**(iv)**Median is a time consuming process as it is required to arrange the data before calculating the median.

**(v)**It is difficult to compute median for data set with large number of observations.

**MODE:**

Mode
is defined as the value of the variable which occurs most frequently in the
data set.

Grouped
Data Mode =

Where l= lower limit of the modal class

F= frequency of the model class. F

_{1}and F_{2}are the frequencies before and after the model class.
C= length of the class.

**Properties of Mode:**

**i)**Mode can be used as a central location for qualitative as well as quantitative data.

**ii)**If is not affected by extreme values

**iii)**If can also be used for open-ended classes

**iv)**It is difficult to find the mode, when a data set contains no value that occurs more than once (or) all items are having the same frequency.

**Relation between Mean, Median and Mode**.

·
In case of a symmetrical
distribution, mean, median and mode coincide

i.e
Mean=mode=median

·
If the distribution is
moderately asymmetrical,

Mean-Median=(Mean-Mode)/3

Thus Mode=3 Median-2 Mean

**Measures of Dispersion:**

A
measure describing how scattered or spread out the observations in a data set
are

**Range :**

Range
is defined as the difference between the value of the smallest observation and
the value of the largest observation present in the distribution

Co-efficient of Range =

**Properties of Range:**

**i)**Range is simple to understand and easy to calculate

**ii)**Range is the quickest way to get a measure of dispersion, although it is not accurate.

**iii)**It is not based on all the observation in the data

**iv)**It is influenced by extreme values

**v)**Range cannot be computed for frequency distribution with open-end classes

**Inter-Quartile Range**:

Quartile Deviation: In range we used to calculate L-S terms. But
in this case we leave the first 25% and last 25% terms to avoid the undue
importance of extreme values.

So it means that we get Q

_{1}and Q_{2}if we leave first and last 25% terms.
A.
Inter-Quartile Range = Q

_{3}-Q_{1}and Semi inter quartile range =
B.
Coefficient of Quartile
Deviation:

=

**Mean Deviation**:

M.D.=

f=
frequency of corresponding interval

N=
is total no. of frequencies

(D)=
deviations from median or mean or mode ignoring ± signs

**Coefficient of Mean Deviations**:

Coefficient
of M.D.

A.
Individual series:

M.D=

B.
Discrete Series:

M. D =

Note:
dy is deviation of variable from X, M or Z ignoring ±
signs

**Standard Deviation**:

S.D=
where x=x-; x

^{2}=
Coefficient
of standard deviation =

**For a symmetric distribution the relationship among Q.D., M.D. & S.D. is:**

Q.
D. = 2/3 S.D.

M.D.
4/5 =S.D.

Q.
D. = 5/6 M.D.

M.D.
= 6/5 Q.D.

Q.
D.= 2/3 S.D.

S.D.
= 3/2 Q. D.

M.D.
= 4/5 S.D.

S.D.
= 5/4 M.D.

**Correlation:**

The
measurement of the degree of relationship between two variables is called
correlation.

Coefficient
of Rank Correlation:

The
relation ship between two variables which can not be measured directly can be
found by the coefficient of rank correlation.

**Spearmann’s Rank Correlation coefficient (**

**r**

**)**=

Where
d

_{i}is the fiff. In the i^{th }rank of two quantities & n is the no. of observations.
The
value of ‘r’ lies between -1 and +1

The
value of r is equal to 1 for a
perfect positive correlation.

The
value of r is equal to -1 for a
perfect negative correlation.

r is 0 for a complete absence of
correlation.

If
|r| < 0.2, the relationship is
‘negligible’.

If
0.2 < |r| < 0.4, the
relationship is ‘slight’.

If
0.4 < |r| < 0.7, the
relationship is ‘substantial’.

If
0.7 < |r| < 1, the relationship
is ‘very high’.