Stats 2 -topic 2

zchilz's version from 2017-01-24 10:53

Section 1

Question Answer
interger rounded to nearest whole number
VARIABILITY Tendency to vary
STATISTIC INFERENCE Process of moving information about samples to statments about population
Population parameters statements we make about population
interqualtitle range away to get around problem of extremes
histogram display distribution of a data set in a graph.
Possion distribution common histogram when data see unusual things
normal distribution bell curve

Section 2

Question Answer
STANDARD DEVIATION a good measure of variation-- should increase as more variation in the data, tend not to chnage as a sample size changes, extreme data value should have a moderate influence on the statistics.
DEVIATION positiveif the data is greater than the mean. (smaller is negative)
to overcome the issue of of + and - cancelling each other out either mean absoulute deviation or root mean square deviation
sum of squares add all together squared deviation
variance of poponce all add then divide gives variance of pop
root mean pop square root stage
mean absoulute deviation we drop the negative signs (from deviations that have them) and calculate the mean of these by dividing by how many there are.
root mean square deviation - square each number because the square of negative is positive. Then take mean of these squared deviations and take the square root of the results.
DEGREES OF FREEDONN-1 (sample size) - (number of parameters estimated from data)

Section 3

Question Answer
NORMAL DISTRIBUTION is the peaked distisribution in data
the mean of normal distibutiondefines ins central tendency- the peak
tails of distibution Are the bits of curve along way from the mean
Z-CALCULATIONSunit of variation.(usually a whole number)

Section 4

Question Answer
central tendency MEAN, MEDIAN, MODE- note doesn't give variation, which is more meaningful in stats anylisis
rangemax-min.. v crude estimate of variation and very sensitive to extreme values, so limited in use
interquartile range better than range but awkward to work with
Standard deviation better measurement of variation and once calculated can use in a number of other stat techniques

Section 5

Question Answer
SAMPLINGUnrealibility and reliablity are key
measurement error values for measurements not recorded correctly
rounding error when approximations are made
errordifference between true values and estimated values.
Random determined by chance without any order, purpose or dependence or other things.
BIAS exaggerate each others inaccuracies then measurements with only random errors
sampling error when sample doesnt represent the wider population.- but no sample will ever perfectly represent the wider population
EFFECTS CAN OCCUR IN METHOLODOGLY observer effect, history effect, testing effec, instrumentation effect, selection effects, mortality effect, particpant effect, macho effect.
Nomenclature difference between true value and estimated value.
beneficent subject bias particpants aware of research so may reponse in a may that supports it,
maleficent subject bias o aware of research and attempt to respond in a way that undermines it.
RRPReciprocation, precision, randomisation - TRY TO AVOID PSEUORPLICATION

Section 6

Question Answer
CENTRAL LIMIT THEOREM distribution of sample means its nearly always normal, no matter what the distribution values are, as long as sample size not too small.
RELIABILITY is a measure of how reliable the sample mean estimates the population mean.
UNRELIABILTY SD/sqrt(N) or sqrt(varience/N) “standard error of the mean”. Most important measure in statistical methods.

Section 7

Question Answer
CONFIDENCE INTERVALS calculate a range within which we are confident that the true mean lies
AS RANGE GETS WIDER...more confidence...99% CI is wider than the 95% CI.
factors that make a confidence interval wide Sample size, student T, Standard error
STUDNET T DISTRIBUTION - W.S Gossett. Similar to normal distribution but one difference – width of distribution varies and is controlled by the degree of freedom.
(ttab) = CRITICAL VALUECI=2 x ttab x SE (standard error).

Section 8

Question Answer
CHARTSdisplay visually, summary satistics (boxplots), indivudal data (scatterplots)
TABLESshow data in a numerical form, rows and columns
GRAPHSa TYPE Of graph where axis is plotted
KEY FEATURES OF CHARTSerror bars, axis, labels, a legend, reduction in chart junk, links, annotation can add value. "maximise the ratio of information shown to ink use"
KEY FEATURES OF TABLES - should have minimum maximum, population mean, sample size. o No vertical lines, only 3 lines, thick above and below definition of table, title, caption, sensible decimal places.

Section 9

Question Answer
OCCAMS RAZOR : if two explanations account for the facts equally well, the simpler is to be preferred.
NULL HYPOTHESIS (Ho)NO PATTERN to explian, reject alternative. its not within conifdence level
TYPE 1 ERROR if you accept the null hypoth when alternative is true, (false negative)
TYPE 2 ERROR If you accept alternative (more complex) explanation when the null hypothesis is actually true- FALSE POSITVE
ALTERNATIVE HYPOTHOSIS (H1)we reject the null, if its within the confidence level
P VALUE measures the probability that the difference we find occured by chance alone, assuming no real difference.
SMALL P VALUEshows significance; leads us to reject the null. p-value below 0.05 indicates a significance
IMPACT OF SAMPLE SIZEstandard error increases if sample size small, fewer DF means t distribution is slighly fatter.
P-VALUES CANNOT BE REPORTED ALONE DUE TO..size difference (strength of pattern), sample size, variability of the data/
TWO SAMPLE T-TESTS help to calculate the probabilty that there is no real difference between the two means.
PARAMETRIC TESTSestimate of population para

Section 10

Question Answer
f tests - Compares two variances to see whether one is significantly bigger than the other. R.A Fisher created F-distribution.
F TEST COMPARES..variance. F=Variance 1 divided by variance 2 (excel-FDIST(DATA)*2
LEVENS TESTcompares variance- but more robust as does not require normality of the data being tested.
T test and F test assumptions o Normality (individuals in each pop are normally distributed) o Random sampling o Independence
IF assumption is voilated, you shoulduse NON PARAMETRIC TEST, random sampling,
Non parametric alternatives Mann-Whitney U test--> ranks data, calculates rank of data. Two sample Kolmogorov S mirnov test--> compared shapes of distribution of 2 samples. not good at comparing averages

Section 11

Question Answer
ANOVAAnalysis of variance
SSTtotal sun of squares
Variance SST/DF
STATISTICAL MODELSproduces fitted values also known as PREDICTED VALUES for each sampling unit
RESIDUALobserved value fitted value.
Coefficient of determination SSM/SST
ANOVA USEDto compare more than two means simultaneously. use it instead of t-test to avoid Type 1 and 2 errors
Null hypothesis in ANOVA all means are the same
model complexity is meansured byDF
ANOVA assumptionsconstant variance, normal distribution of residuals
if assumptions are voilatednot random sampling, likely to be bias, if not independent- over confidence in results as sample size bigger than should be