# Statisitics

jmnies's version from 2016-12-06 05:39

## Regression

An approach for modeling a relationship between an independent variable (X) and a dependent variable (Y)linear regression
When a correlation exists between two variables, regression predicts what?unknown data
one continuous DV, one continuous or categorical IVSimple (bivariate) linear regression
one continuous DV, more than one continuous or categorical IVMultiple (multivariable) linear regression
more than one continuous DV, more than one continuous or categorical IVMultivariate linear regression
ŷ (y hat, expected Y value) =a + b(X)
what is intercept, regression constanta
the point where a regression line intercepts the Y-axisa
what is slope, regression coefficientb
the amount of change in Y (DV) as X (IV) changesslope, regression coefficient,b
Y isDV
X isIV
ŷ = 0.015 + 2.915(X) what is regression constant, what is slope, what is IV?0.015 and 2.915, X
percentage of variation in the DV (five times sit to stand) that is explained by an IVR2
What is R2percentage of variation in the DV (five times sit to stand) that is explained by an IV
when to use AR2small sample size
Can AR2 be negative?yes
R2 and AR2 showhow well data fit a regression line
Total variance in DV isregression variation + residual variation (error variance)
variance in DV related to changes in IVregression variation
variance in DV not related to changes in IVresidual variation (error variance)
used for prediction in linear regression, relationships are expressed in terms of original dataunstandardized
original data are converted into z-scores to standardize coefficient, interpreted like Pearson rstandardized, beta co-efficent
beta coefficient is equal toSD of IV/SD of DV * unstandardized coefficient
regression that uses a continous DVlinear
regression that uses a categorical DVlogisitc
one categorical (binary, dichotomous) DV, one continuous or categorical IVSimple logistic regression
Multiple (multivariable) logistic regressionone categorical (binary, dichotomous) DV, more than one continuous or categorical IV
one categorical DV with more than two levels, more than one continuous or categorical IVMultinomial (polychotomous) logistic regression
more than one categorical (binary, dichotomous) DV, more than one continuous or categorical IVMultivariate logistic regression

## correlation

Three assumptions of correlationHomoscedasticity, Linearity, Normality
Homoscedasticitythere is an equal variance or scatter (“scedasticity”) of data points dispersed along the regression line
violation of homoscedasticity canunderestimate the strength of a correlation
Linearitythere is a straight line relationship between the IVs and the DV, constant amount of change between two variables
Normalitydata points normally distributed
violating normality candistort a correlation coefficient
Correlation Coefficient provides what two pieces of informationdirection (positive or negative) of the relationship, strength of the relationship
(true/ false) a correlation of -.90 has the same degree of strength as +.90true
Most commonly used to quantify the association between two interval or ratio variablespearsons r
is pearsons r parametric or non?para
what are the denominator/numerator in pearson r?variance/covariance
what is df in pearsons r?n-2
if r obs is greater than r critical what do we do?reject the null
rank-order correlationspearman
is spearman para or non?non

## chi squared

Chi square is needed to see if there is a relationship between what variables?categorical
can you have negative values in chi squared?no
determines how well observed frequencies match expected frequencies using only 1 variableGoodness of fit test
Goodness of fit test df= k (# of levels of a variable) - 1
examines the relationship between two categorical variables that have two or more levelsTest of independence

## Validity and reliability in surveys

Validityhow well a survey measures what it sets out to measure
correctness, accuracy, and appropriateness of a measurementValidity
Criterion-related validitycriterion is a second test or other assessment procedure to examine the relationship between two variables
How is criterion-related validity quantified?correlation coefficient
predictive validity (under criterion-related validity)when measuring the relationship between two variables (anxiety in high school vs. anxiety in college) at a different time
how well a test measures a construct (e.g. depression) that it was designed to measureconstruct validity
consistency of measurementReliability
correlation between itemsReliability
usually odd items vs. even itemsSplit-half procedure
Spearman-Brown formula =2 x r / 1 + r

## Summarizing data (variability)

A measure of the dispersion or spread of scoresVariability
VariabilityA measure of the dispersion or spread of scores
What are examples of variabilityrange, variance, standard deviation, standard error
Average of squared deviations about a meanvariance
Measure of variation of scores about the meanSD
Amount of variation or dispersion from the averageSD
variation of the sampling distribution of a statistic, most commonly of the meanStandard error

## Summarizing data (data distribution & central tendency)

A single score that summarizes the center of a distributionCentral tendency
mean, median, and modeCentral tendency
Symmetric distributionmean = median
Unimodal distributionmean = median = mode
mean to the left of the median, long tail on the leftSkewed left (negative skew)
mean is less than medianleft skewed
mean is greater than medianright skewed
measure of the peakedness of distribution of scoresKurtosis

## Introduction (types of variables)

descriptive statisticsused to describe and summarize data
mean, median, mode are what type of stat?descriptive
inferential statisticsused to reach conclusions that extend beyond the immediate data alone
examples of inferential statst test, ANOVA, and regression analysis
summarize sample resultsdescriptive statistics
generalize to populationsinferential stats
two or more categories, but do not have an ordernominal categorical
have two or more categories and can be ordered or rankedordinal categorical
categorical variablesnominal, ordinal
continous variablesinterval, ratio
which continuous var is 0 meaningfulratio
proportion of people with a disease at a particular point in timeprevalence

## Post Hoc Test in ANOVA

when is post hoc test needed?when there is a significant ANOVA and k > 2
No post hoc forsignificant interaction
Scheffe testmost conservative, so it is least likely to yield a significant result
Tukey’s Honestly Significant Difference (HSD) testnot too conservative and not too liberal
Fisher’s Least Significant Difference (LSD) testmost liberal, so it is more likely to produce a significant result
Has a larger critical value than Tukey’s HSD testScheffe
Has a larger critical value than Fisher’s LSD testTukey’s

## Hypothesis Testing (Effect Size)

A quantitative study design used to systematically assess previous studies to derive conclusionsmeta
What are three ways you can include or exclude studiesInternal validity, sample size, study design
whether an experiment has a statistically significant effecteffect size
Cohen’s d is related to what?effect size

## Probability

Probabilitya measure of the likeliness that an event will occur
Probability sampling typesstratified and systematic sampling
an equal chance of being selected as a participantProbability sampling
researchers divide the entire population into different subgroups based on age, gender, etcstratified, probability
used to examine relationships between subgroupsstratified, probability
researchers pick an interval numbersystematic, probability
Nonprobability sampling typespurposive, convenience and snowball
to serve a very specific study needpurposive
subjects are selected because of their convenient accessibility and proximityconvenience
asking participants to nominate another person with the same trait, used in rare casessnowball
P < .05 meansthat a difference or a relationship would be expected less than 5 times in 100 as a result of chance
related to normal distributiont distribution, continuous
a family of right-skewed distributions (continuous)chi square and f distribution, continuous
Discrete distribution typesbernoulli, binomial and poisson distribution
bernoulliprobability distribution for two outcomes (e.g., success vs. fail) in a trial
binomial distributionprobability distribution for two outcomes (e.g., success vs. fail) in a series of trials
has an upper limit (e.g., tossing a coin two times and having two heads)binomial distribution
a family of right-skewed distributions (discrete)poisson distribution
similar to binomial distribution, but does not have an upper limitpoisson distribution
e.g., the number of accidents at an intersection in a time periodpoisson distribution
what is mean and SD equal to in standard normal distribution?mean is 0 and SD 1
z score in standard normal distributionvalue on the x-axis of a standard normal distribution. The numerical value specifies the distance or standard deviation of a value from the mean
distributions tend to be close to normal distribution as the number of samples increasesCentral limit theorem
With a small probability of success (p = .20), the binomial distribution is skewedright
With a constant probability of success (p = .50), the binomial distribution issymmetric
With a large probability of success (p = .80), the binomial distribution is skewedleft

## ANOVA

measure of a characteristic of a populationParameter
Parametric test uses what type of variablesratio or interval variables
t test, Pearson correlation, analysis of variance (ANOVA)parametric
Nonparametric test uses what type of variablesordinal or nominal variables
chi squarenonparametric
ANOVA is used to analyze differences between groupmeans
ANOVA IVcategorical
ANOVA DVcontinous
Effect Size for One-Way ANOVAProportion of variance where you see how independent variable (IV) has affected a dependent variable (DV), how much variance in DV (e.g., weight loss) can be accounted for by IV (e.g., weight loss program)
How can you measure effect size?η² (eta squared) & R2 (r squared)
Two or more categorical IVs (main effects), One continuous DVTwo-Way ANOVA

## T test

T test used when the population variance isunknown
population varianceaverage squared distance from the population mean
if your df is between two dfsround down to the next lowest df
in one sample T if Tobs is greater than Tcrit thanreject null
Compares two independent sample meansIndependent t test
df for independent t testdf=(n1+n2)-2
two dependent T tests?Repeated-measures and matched pair
same participants are tested repeatedly on the same variableRepeated-measures t test
Matched pairs t test (paired t test)two groups (different participants) are matched on one or more characteristics (e.g., matched on GPA and age)

## Type I & II Error, Power, Confidence Interval & Sample Size

The probability of rejecting H0 when H0 is trueType I Error
The probability of type I error isα (significance level of a test)
if α = .05 and we reject H0, there isa 5% probability that we commit a type I error
The probability of accepting H0 when H1 is trueType II Error
The probability of type II error isβ
What error is more serioustype I
Statistical PowerProbability of rejecting H0 when H0 is false, detecting a real difference or relationship
factors that increase powereffect size and sample size
factors that decrease powerSD and standard error
A statistical procedure in which a sample statistic is used to estimate the value of an unknown population parameterestimation
Point estimationthe use of a sample statistic to estimate a population parameter (e.g., a population mean)