One stop blog for all competitive examinations

                               STATISTICS Statistics is concerned with scientific methods for collecting, organizing, summarizing, pr...

Mathematics for Placement Papers - STATISTICS- Concept and Important FormulaS

                              STATISTICS

Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analysing data as well as withdrawing valid conclusions and making reasonable decisions on the basis of such analysis.

Data: Data is collection of related facts, observations or figures. A collection of data is data set and each observation a data point .

Variables :
Discrete and Continuous :
A variable is a symbol, such as X,Y ,H, x or y that can any of a prescribed set of values called the domain of the variable. If the variable can assume only one value, it is called a constant.
 A variable that can assume any value between the two given values is called a continuous variable, otherwise, it is called discrete variable.
Example : The number N of children in a family  , which can assume  any of the values 0,1,2,3  but cannot be 2.5 or 3.842 is a discrete variable.
The height H of an individual, which can be 67 inches, 65.5 inches or 66.234, depending on the accuracy of measurement, is a continuous variable.

Tabulation : The arrangement of the raw data under various heads in the form of a table is called tabulation.
Frequency : Number of observations falling in a particular class is called the frequency of that class
Cumulative frequency : The cumulative frequency of a particular class is the sum of the frequencies of this class and those prior to it.
Frequency Distribution Table : The table showing the class intervals along with the corresponding class frequencies is know as “ Frequency Distribution Table “.
Length of the class : The difference between the upper and lower boundaries is called the Length of the  Class
            Mid value :   The average of lower and upper limit is called “ mid value “ of a class interval .

Measures of Central Tendency:
It gives us an idea about the concentration of the values in the central part of the distribution. An average of statistical series is the value of the variable which is representative of entire distributes
Types of Central Tendency:
  1 Mathematical Average.
          a) Arithmetic Mean
          b) Geometric Mean
          c) Harmonic Mean.
2 . Positional Averages.
a)    Median
b)    Mode

Arithmetic mean :
i)             Individual Series : A.M of ungrouped date =
Sum of the items/Number of Items=
ii)            Discrete Series : If X1, X2, ……Xn are n distinct values with frequencies f1, f2,f3, …….fn then
iii)          Continuous Series: If X1,X2,X3…..Xn are the mid values and f1,f2,f3,….fn are frequencies of a grouped data then
In the step deviation method A.M= where A= Assumed mean, C= width of the class d=
Important Results
(i)           The algebraic sum of deviations taken about mean is zero.
(ii)          For any two numbers a and b, then their mean is (a+b)/2
(iii)         Every data set has only one mean.

Merits of Arithmetic Average :
(i)           Easy to understand and easy to calculate
(ii)          If Provides good basis for comparison.
(iii)         As every item is taken in calculation, it is affected by ever item

Demerits of Arithmetic Average :
(i)           It cannot be located graphically
(ii)          A single observation can bring , big change in the mean
(iii)         It is very difficult to find the actual mean
(iv)         We cannot calculate the mean for a data set with open ended classes.

Weighted Arithmetic Mean
The weighted mean is calculated taking into account the relative importance of each of the values to the total value
          When the observations X1, X2, X,……Xn and the weights W1, W2, W3,…..Wn are given to each observation, then weighted Arithmetic mean is given by
Combined Mean:  If X1 and X2 are arithmetic means of two series with m and n observations respectively, the combined mean is
Geometric Mean: 
It is useful when we have some quantities that change over a period of time.
(i)          Ungrouped data (individual Series)
G.M=  =   where X1, X2, X,……Xn are n observations.
(ii)         Discrete Series:
G.M. = where f1,f2,f3,….fn are frequencies and X1, X2, X,……Xn are n observations and is the sum of the observations.

Properties of the Geometric Mean.
i)             G.M. is used in calculating the growth rates.
ii)            If any observation is zero, G.M becomes zero.
iii)           It is difficult to calculate the nth root.
iv)           If a and b are two numbers then their G.M. is .
v)            If any observation is negative, G.M. is imaginary

Harmonic Mean:
Harmonic mean of a given series is the reciprocal of the arithmetic average of the reciprocals of the values of its various observations.
i)             Ungrouped Data (Individual Series): Let X1, X2, X,……Xn be n observations, then their H.M. =
ii)            Discrete Series:
H. M. =  
Where n is the sum of the observations and f1,f2,f3,….fn are frequencies of the observations X1, X2, X,……Xn respectively

Properties of H.M.
(i)           H.M. useful to calculate speed and distance.
(ii)          If a and b are two numbers, their H.M. is .

MEDIAN:
 The median as the name suggests, is the middle value of a series arranged in any order of magnitude.
For Ungrouped Data:
(i)           If n is odd,  th observation is the median, after arranging the observations either ascending order of descending order.
(ii)          If n is even, then the average of the middle two observations is the median, after arranging the observations either in ascending or descending order.

For Grouped Data: 
Median =  
Where L= Lower limit of median class
          N= Sum of the frequencies
          M=The cumulative frequency before the median class
          F=frequency of the median class
          C=length of the class

Properties of Median:
(i)          Median is easy to understand and it can be computed from any kind of data even for grouped data with open-ended classes, but excluding the case when median falls in the open-ended class.
(ii)         Median can also be calculated for qualitative data
(iii)        The sum of absolute deviations taken about median is least
(iv)        Median is a time consuming process as it is required to arrange the data before calculating the median.
(v)         It is difficult to compute median for data set with large number of observations.

MODE: 
Mode is defined as the value of the variable which occurs most frequently in the data set.
Grouped Data Mode =
Where l= lower limit of the modal class

F= frequency of the model class. F1 and F2 are the frequencies before and after the model class.
C= length of the class.

Properties of Mode:
i)             Mode can be used as a central location for qualitative as well as quantitative data.
ii)           If is not affected by extreme values
iii)          If can also be used for open-ended classes
iv)          It is difficult to find the mode, when a data set contains no value that occurs more than once (or) all items are having the same frequency.

Relation between Mean, Median and Mode.
·         In case of a symmetrical distribution, mean, median and mode coincide
i.e Mean=mode=median
·         If the distribution is moderately asymmetrical,
Mean-Median=(Mean-Mode)/3
Thus   Mode=3 Median-2 Mean

Measures of Dispersion:
A measure describing how scattered or spread out the observations in a data set are
Range :
Range is defined as the difference between the value of the smallest observation and the value of the largest observation present in the distribution
Co-efficient of Range =
Properties of Range:
i)             Range is simple to understand and easy to calculate
ii)           Range is the quickest way to get a measure of dispersion, although it is not accurate.
iii)          It is not based on all the observation in the data
iv)          It is influenced by extreme values
v)           Range cannot be computed for frequency distribution with open-end classes

Inter-Quartile Range
          Quartile Deviation:  In range we used to calculate L-S terms. But in this case we leave the first 25% and last 25% terms to avoid the undue importance of extreme values.
          So it means that we get Q1 and Q2 if we leave first and last 25% terms.
     A.    Inter-Quartile Range = Q3-Q1 and Semi inter quartile range =   
     B.    Coefficient of Quartile Deviation:
=

Mean Deviation:
 M.D.=
f= frequency of corresponding interval
N= is total no. of frequencies
(D)= deviations from median or mean or mode ignoring ± signs

Coefficient of Mean Deviations:
Coefficient of M.D.
A.    Individual series:
M.D=
B.    Discrete Series:
M. D =
Note: dy is deviation of variable from X, M or Z ignoring ± signs

Standard Deviation:
S.D=  where x=x-; x2=

Coefficient of standard deviation =  

For a symmetric distribution the relationship among Q.D., M.D. & S.D. is:
Q. D. = 2/3 S.D.
M.D. 4/5 =S.D.
Q. D. = 5/6 M.D.
M.D. = 6/5 Q.D.
Q. D.= 2/3 S.D.
S.D. = 3/2 Q. D.
M.D. = 4/5 S.D.
S.D. = 5/4 M.D.

Correlation:
The measurement of the degree of relationship between two variables is called correlation.
Coefficient of Rank Correlation:
The relation ship between two variables which can not be measured directly can be found by the coefficient of rank correlation.
Spearmann’s Rank Correlation coefficient (r) =
Where di is the fiff. In the ith  rank of two quantities & n is the no. of observations.
The value of ‘r’ lies between -1 and +1
The value of r is equal to 1 for a perfect positive correlation.
The value of r is equal to -1 for a perfect negative correlation.
r is 0 for a complete absence of correlation.
If |r| < 0.2, the relationship is ‘negligible’.
If 0.2 < |r| < 0.4, the relationship is ‘slight’.
If 0.4 < |r| < 0.7, the relationship is ‘substantial’.
If 0.7 < |r| < 1, the relationship is ‘very high’.