# Stats 2 -topic 2

zchilz's version from 2017-01-24 10:53

## Section 1

interger rounded to nearest whole number
VARIABILITY Tendency to vary
STATISTIC INFERENCE Process of moving information about samples to statments about population
Population parameters statements we make about population
interqualtitle range away to get around problem of extremes
histogram display distribution of a data set in a graph.
Possion distribution common histogram when data see unusual things
normal distribution bell curve

## Section 2

STANDARD DEVIATION a good measure of variation-- should increase as more variation in the data, tend not to chnage as a sample size changes, extreme data value should have a moderate influence on the statistics.
DEVIATION positiveif the data is greater than the mean. (smaller is negative)
to overcome the issue of of + and - cancelling each other out either mean absoulute deviation or root mean square deviation
sum of squares add all together squared deviation
variance of poponce all add then divide gives variance of pop
root mean pop square root stage
mean absoulute deviation we drop the negative signs (from deviations that have them) and calculate the mean of these by dividing by how many there are.
root mean square deviation - square each number because the square of negative is positive. Then take mean of these squared deviations and take the square root of the results.
DEGREES OF FREEDONN-1 (sample size) - (number of parameters estimated from data)

## Section 3

NORMAL DISTRIBUTION is the peaked distisribution in data
the mean of normal distibutiondefines ins central tendency- the peak
tails of distibution Are the bits of curve along way from the mean
Z-CALCULATIONSunit of variation.(usually a whole number)

## Section 4

central tendency MEAN, MEDIAN, MODE- note doesn't give variation, which is more meaningful in stats anylisis
rangemax-min.. v crude estimate of variation and very sensitive to extreme values, so limited in use
interquartile range better than range but awkward to work with
Standard deviation better measurement of variation and once calculated can use in a number of other stat techniques

## Section 5

SAMPLINGUnrealibility and reliablity are key
measurement error values for measurements not recorded correctly
rounding error when approximations are made
errordifference between true values and estimated values.
Random determined by chance without any order, purpose or dependence or other things.
BIAS exaggerate each others inaccuracies then measurements with only random errors
sampling error when sample doesnt represent the wider population.- but no sample will ever perfectly represent the wider population
EFFECTS CAN OCCUR IN METHOLODOGLY observer effect, history effect, testing effec, instrumentation effect, selection effects, mortality effect, particpant effect, macho effect.
Nomenclature difference between true value and estimated value.
beneficent subject bias particpants aware of research so may reponse in a may that supports it,
maleficent subject bias o aware of research and attempt to respond in a way that undermines it.
RRPReciprocation, precision, randomisation - TRY TO AVOID PSEUORPLICATION

## Section 6

CENTRAL LIMIT THEOREM distribution of sample means its nearly always normal, no matter what the distribution values are, as long as sample size not too small.
RELIABILITY is a measure of how reliable the sample mean estimates the population mean.
UNRELIABILTY SD/sqrt(N) or sqrt(varience/N) “standard error of the mean”. Most important measure in statistical methods.

## Section 7

CONFIDENCE INTERVALS calculate a range within which we are confident that the true mean lies
AS RANGE GETS WIDER...more confidence...99% CI is wider than the 95% CI.
factors that make a confidence interval wide Sample size, student T, Standard error
STUDNET T DISTRIBUTION - W.S Gossett. Similar to normal distribution but one difference – width of distribution varies and is controlled by the degree of freedom.
(ttab) = CRITICAL VALUECI=2 x ttab x SE (standard error).

## Section 8

CHARTSdisplay visually, summary satistics (boxplots), indivudal data (scatterplots)
TABLESshow data in a numerical form, rows and columns
GRAPHSa TYPE Of graph where axis is plotted
KEY FEATURES OF CHARTSerror bars, axis, labels, a legend, reduction in chart junk, links, annotation can add value. "maximise the ratio of information shown to ink use"
KEY FEATURES OF TABLES - should have minimum maximum, population mean, sample size. o No vertical lines, only 3 lines, thick above and below definition of table, title, caption, sensible decimal places.

## Section 9

HYPOTHESIS TESTING AND 1 SAMPLE T TEST
OCCAMS RAZOR : if two explanations account for the facts equally well, the simpler is to be preferred.
NULL HYPOTHESIS (Ho)NO PATTERN to explian, reject alternative. its not within conifdence level
TYPE 1 ERROR if you accept the null hypoth when alternative is true, (false negative)
TYPE 2 ERROR If you accept alternative (more complex) explanation when the null hypothesis is actually true- FALSE POSITVE
ALTERNATIVE HYPOTHOSIS (H1)we reject the null, if its within the confidence level
CRITITCAL LEVEL OF PROABABILTY IS KNOWN AS ..ALPHA VALUE, alpha 0.001 (1%) --> MINIMUN DIFFERENCE
P VALUE measures the probability that the difference we find occured by chance alone, assuming no real difference.
SMALL P VALUEshows significance; leads us to reject the null. p-value below 0.05 indicates a significance
IMPACT OF SAMPLE SIZEstandard error increases if sample size small, fewer DF means t distribution is slighly fatter.
P-VALUES CANNOT BE REPORTED ALONE DUE TO..size difference (strength of pattern), sample size, variability of the data/
TWO SAMPLE T-TESTS help to calculate the probabilty that there is no real difference between the two means.
PARAMETRIC TESTSestimate of population para

## Section 10

f tests - Compares two variances to see whether one is significantly bigger than the other. R.A Fisher created F-distribution.
T TEST COMPARES..mean
F TEST COMPARES..variance. F=Variance 1 divided by variance 2 (excel-FDIST(DATA)*2
LEVENS TESTcompares variance- but more robust as does not require normality of the data being tested.
T test and F test assumptions o Normality (individuals in each pop are normally distributed) o Random sampling o Independence
2 SAMPLE T-test assumption HOMOGENEITY OF VARIANCE.
IF assumption is voilated, you shoulduse NON PARAMETRIC TEST, random sampling,
Non parametric alternatives Mann-Whitney U test--> ranks data, calculates rank of data. Two sample Kolmogorov S mirnov test--> compared shapes of distribution of 2 samples. not good at comparing averages