# QM 2 ( Reading 7 )

version from 2017-02-21 23:24

## Section

StatisticsData and methods of analyzing Data
Descriptive StatisticsHow large volumes of data are converted into useful, readily understood information by summarizing their important characteristics
Inferential statisticsMethods used to make forecasts, estimates, or draw conclusions about a larger set of data based on a smaller, representative set.
Name the Measurement Scales Nominal, Ordinal, Interval, and Ratio
Nominal scalethey categorize the data but do not rank
Ordinal ScaleCategorize data in categories that are ranked ( worst performing to best performing ) and hence letting you infer that a stock belonging to a group 2 did well in the market.
Interval Scalenot only scaled but scaled in a way that values can be added or subtracted to them ( temperature ) but, zero does not mean absence of data and multiplication does not work. ( 6 times 10 is not 60 degrees )
Ratio ScaleStrongest scale, which has all the characteristics of the interval scale, as well as an origin point ( zero ) and therefore, zero means zero and absence of data.
Frequency distributiontabular illustration of data categorized into a relatively small number of intervals or classes which include all the data and are mutually exclusive. All scales apply to these frequency distributions
Modal intervalThe interval with the highest frequency.
Relative Frequencyproportion or fraction of total observations that lies in that particular interval. it is calculated by dividing the absolute frequency by the total number of observations.
Cumulative absolute frequencyObservations that are less than the upper bound of the interval. ( sum of all the intervals under the upper bound not just this interval )
Cumulative relative frequencytotal observations lower than the upper bound divided by total absolute observations OR cumulation of all the relative frequencies for intervals lower than the upper bound.
Frequency Histogramused to graphically represent the data contained in frequency distribution
Frequency PolygonGraphically illustrates the data in a frequency distribution. ( the midpoint of the interval represents the interval on X axis while y is the frequency )
Arithmetic mean propertiesAll observations are used, all intervals have a mean, sum of deviations from mean is zero, and an interval only has one mean
Median propertyAlthough helpful with skewed data sets, it is solely based on the position in the data set and does not reflect any other data
Calculating Median Even: average of ( n/2 + n+2/2 ) Odd: n+1/2
Mode typesUnimodal: one mode, bimodal: 2 modes or no modes at all ( all numbers happen just once )
Modal intervalthe interval with the highest frequency
Weighted meanAssigns different weight to each observation. ( Sigma X i W i )
Geometric meanFrequently used to average rates of change over time, or to calculate the growth rate of a variable over a period.
Geometric mean FormulaG = Root n of ( X1 X2 X3 ... Xn ) or Root n of [ (X1 + 1) (X2+1)... ( Xn+1 ) ] -1 for numbers between 0 and 1 *** remember that you do not get absolute values and hence you add the negatives to the one just like the rest
Relationship between Geometric and Arithmetic meansG is always less than or equal to Arithmetic mean. G equals Arithmetic only when all entries are identical, and the difference between G and A increases as the dispersion in observed values increases.
Harmonic meanSpecial kind of weighted mean where the weight of an observation is inversely proportional to its magnitude and is mostly used to determine the average cost of shares purchased over time.
Harmonic mean formulaXh = N / Sigma (1/xi )
relationship between Harmonic Mean and Geometric meanUnless all the data are equal, H is always less than G which is always less than A
QuantileA value at, or below which a stated proportion of the observations in a data set lie. ( Quartile, Quintile, Decile, Percentile )
Quantile FormulaLy = ( n+1 ) y / 100 :: y = Percentage at which we are dividing the distribution, Ly = Location of Percentile ( Py ). if say you get 2.25 for your answer, it means you take the 2nd number from left, and add 0.25 of the difference between 2nd and 3rd number.
DispersionVariability or spread of random variable around its central tendency ( risk around mean which is the expected return )
RangeMaximum value - Minimum value
Mean Absolute deviation ( MAD)The average of absolute values of deviations of observations in a data set from its mean.
MAD formulaSigma [ absolute value ( Xi - X bar)] / n
Population Variance formulaSigma ( Xi - Miu )^2 / N
Sample Variance formulaSigma (Xi - Xbar)^2 / n-1
SemivarianceAvg of squared deviations below the mean
SemideviationPositive square root of semivariance
Chebyshev's inequalityA method of calculating an approximate value for the proportion of observations in a data set that lie within k standard deviations from the mean
Chebyshev's formulaProportion of observations within k standard deviations from the mean = 1 - 1/k^2 where k is desired distance from std
When is it not benefitial to use std to compare different populationsWhen data sets being compared have significantly different means, and when data sets have different units of measurement.
Coefficient of VariationRatio of std of the data set to its mean ( risk per unit of return )
Coefficient of Variation FormulaCV = std / Xbar
Sharpe RatioRatio of excess return over the risk free rate to its std
Sharpe Ratio FormulaSharpe ratio = rp - rf / std
Issues with Sharpe ratio1. It does not quite work with negative sharpe ratios ( since it decreases when you increase the risk). 2. std mostly applies to normal curves and not asymmetrical distributions which most investments are
Positively SkewedStretches on the right side of the mean, with Mean > Median > Mode. It has many outliers on the right side of the mean that make it skew more.
Negatively skewedLong tail on the left of the mean, meaning Mean < Median< Mode
Sample Skewness FormulaSk = [ n / (n-1) ( n-2) ] Sigma ( Xi - Xbar )^3 / s^3
Sample skewness for large samplesSk = 1/n Sigma ( Xi - Xbar )^3 / s^3
Properties of Sample skewnessPosirively skewed sample has positive Sk, negatively has negative Sk, normal distributions have 0 sk, and any | sk| greater than 0.5 is significantly skewed
KurtosisMeasures the extent to which a distribution is more or less peaked than a normal distribution. A normal distribution has a kurtosis of 3. It is usually reported as Excess Kurtosis (Ke) which is Kurtosis - 3 ( normal kurtosis )
Types of kurtosisLeptoKurtic: more peaked and has fatter tails, and Ke >0 , Platykurtic: less peaked and thinner tails than normal distribution and Ke <0. Mesokurtic: identical to a normal distribution and has a Ke
Sample Kurtosis FormulaKe = [n(n+1) / (n-1)(n-2)(n-3)](Sigma Xi - Xbar )^4/ S^4 - [3(n-1)^2 / (n-2)(n-3)]
Sample Kurtosis Formula for large nKe= [1/n Sigma (Xi - Xbar)^4 / s^4 ]- 3