# Statisitics

rename
jmnies's
version from
2016-12-06 05:39

## Regression

Question | Answer |
---|---|

An approach for modeling a relationship between an independent variable (X) and a dependent variable (Y) | linear regression |

When a correlation exists between two variables, regression predicts what? | unknown data |

one continuous DV, one continuous or categorical IV | Simple (bivariate) linear regression |

one continuous DV, more than one continuous or categorical IV | Multiple (multivariable) linear regression |

more than one continuous DV, more than one continuous or categorical IV | Multivariate linear regression |

ŷ (y hat, expected Y value) = | a + b(X) |

what is intercept, regression constant | a |

the point where a regression line intercepts the Y-axis | a |

what is slope, regression coefficient | b |

the amount of change in Y (DV) as X (IV) changes | slope, regression coefficient,b |

Y is | DV |

X is | IV |

ŷ = 0.015 + 2.915(X) what is regression constant, what is slope, what is IV? | 0.015 and 2.915, X |

percentage of variation in the DV (five times sit to stand) that is explained by an IV | R2 |

What is R2 | percentage of variation in the DV (five times sit to stand) that is explained by an IV |

when to use AR2 | small sample size |

Can AR2 be negative? | yes |

R2 and AR2 show | how well data fit a regression line |

Total variance in DV is | regression variation + residual variation (error variance) |

variance in DV related to changes in IV | regression variation |

variance in DV not related to changes in IV | residual variation (error variance) |

used for prediction in linear regression, relationships are expressed in terms of original data | unstandardized |

original data are converted into z-scores to standardize coefficient, interpreted like Pearson r | standardized, beta co-efficent |

beta coefficient is equal to | SD of IV/SD of DV * unstandardized coefficient |

regression that uses a continous DV | linear |

regression that uses a categorical DV | logisitc |

one categorical (binary, dichotomous) DV, one continuous or categorical IV | Simple logistic regression |

Multiple (multivariable) logistic regression | one categorical (binary, dichotomous) DV, more than one continuous or categorical IV |

one categorical DV with more than two levels, more than one continuous or categorical IV | Multinomial (polychotomous) logistic regression |

more than one categorical (binary, dichotomous) DV, more than one continuous or categorical IV | Multivariate logistic regression |

## correlation

Question | Answer |
---|---|

Three assumptions of correlation | Homoscedasticity, Linearity, Normality |

Homoscedasticity | there is an equal variance or scatter (“scedasticity”) of data points dispersed along the regression line |

violation of homoscedasticity can | underestimate the strength of a correlation |

Linearity | there is a straight line relationship between the IVs and the DV, constant amount of change between two variables |

if linearity is violated | misleading conclusions |

Normality | data points normally distributed |

violating normality can | distort a correlation coefficient |

Correlation Coefficient provides what two pieces of information | direction (positive or negative) of the relationship, strength of the relationship |

(true/ false) a correlation of -.90 has the same degree of strength as +.90 | true |

Most commonly used to quantify the association between two interval or ratio variables | pearsons r |

is pearsons r parametric or non? | para |

what are the denominator/numerator in pearson r? | variance/covariance |

what is df in pearsons r? | n-2 |

if r obs is greater than r critical what do we do? | reject the null |

rank-order correlation | spearman |

is spearman para or non? | non |

## chi squared

Question | Answer |
---|---|

Chi square is needed to see if there is a relationship between what variables? | categorical |

can you have negative values in chi squared? | no |

determines how well observed frequencies match expected frequencies using only 1 variable | Goodness of fit test |

Goodness of fit test df | = k (# of levels of a variable) - 1 |

examines the relationship between two categorical variables that have two or more levels | Test of independence |

## Validity and reliability in surveys

Question | Answer |
---|---|

Validity | how well a survey measures what it sets out to measure |

correctness, accuracy, and appropriateness of a measurement | Validity |

Criterion-related validity | criterion is a second test or other assessment procedure to examine the relationship between two variables |

How is criterion-related validity quantified? | correlation coefficient |

predictive validity (under criterion-related validity) | when measuring the relationship between two variables (anxiety in high school vs. anxiety in college) at a different time |

how well a test measures a construct (e.g. depression) that it was designed to measure | construct validity |

consistency of measurement | Reliability |

correlation between items | Reliability |

usually odd items vs. even items | Split-half procedure |

Spearman-Brown formula = | 2 x r / 1 + r |

## Summarizing data (variability)

Question | Answer |
---|---|

A measure of the dispersion or spread of scores | Variability |

Variability | A measure of the dispersion or spread of scores |

What are examples of variability | range, variance, standard deviation, standard error |

Average of squared deviations about a mean | variance |

Measure of variation of scores about the mean | SD |

Amount of variation or dispersion from the average | SD |

variation of the sampling distribution of a statistic, most commonly of the mean | Standard error |

## Summarizing data (data distribution & central tendency)

Question | Answer |
---|---|

A single score that summarizes the center of a distribution | Central tendency |

mean, median, and mode | Central tendency |

Symmetric distribution | mean = median |

Unimodal distribution | mean = median = mode |

mean to the left of the median, long tail on the left | Skewed left (negative skew) |

mean is less than median | left skewed |

mean is greater than median | right skewed |

measure of the peakedness of distribution of scores | Kurtosis |

## Introduction (types of variables)

Question | Answer |
---|---|

descriptive statistics | used to describe and summarize data |

mean, median, mode are what type of stat? | descriptive |

inferential statistics | used to reach conclusions that extend beyond the immediate data alone |

examples of inferential stats | t test, ANOVA, and regression analysis |

summarize sample results | descriptive statistics |

generalize to populations | inferential stats |

two or more categories, but do not have an order | nominal categorical |

have two or more categories and can be ordered or ranked | ordinal categorical |

categorical variables | nominal, ordinal |

continous variables | interval, ratio |

which continuous var is 0 meaningful | ratio |

proportion of people with a disease at a particular point in time | prevalence |

## Post Hoc Test in ANOVA

Question | Answer |
---|---|

when is post hoc test needed? | when there is a significant ANOVA and k > 2 |

No post hoc for | significant interaction |

Scheffe test | most conservative, so it is least likely to yield a significant result |

Tukey’s Honestly Significant Difference (HSD) test | not too conservative and not too liberal |

Fisher’s Least Significant Difference (LSD) test | most liberal, so it is more likely to produce a significant result |

Has a larger critical value than Tukey’s HSD test | Scheffe |

Has a larger critical value than Fisher’s LSD test | Tukey’s |

## Hypothesis Testing (Effect Size)

Question | Answer |
---|---|

A quantitative study design used to systematically assess previous studies to derive conclusions | meta |

What are three ways you can include or exclude studies | Internal validity, sample size, study design |

whether an experiment has a statistically significant effect | effect size |

Cohen’s d is related to what? | effect size |

## Probability

Question | Answer |
---|---|

Probability | a measure of the likeliness that an event will occur |

Probability sampling types | stratified and systematic sampling |

an equal chance of being selected as a participant | Probability sampling |

researchers divide the entire population into different subgroups based on age, gender, etc | stratified, probability |

used to examine relationships between subgroups | stratified, probability |

researchers pick an interval number | systematic, probability |

Nonprobability sampling types | purposive, convenience and snowball |

to serve a very specific study need | purposive |

subjects are selected because of their convenient accessibility and proximity | convenience |

asking participants to nominate another person with the same trait, used in rare cases | snowball |

P < .05 means | that a difference or a relationship would be expected less than 5 times in 100 as a result of chance |

related to normal distribution | t distribution, continuous |

a family of right-skewed distributions (continuous) | chi square and f distribution, continuous |

Discrete distribution types | bernoulli, binomial and poisson distribution |

bernoulli | probability distribution for two outcomes (e.g., success vs. fail) in a trial |

binomial distribution | probability distribution for two outcomes (e.g., success vs. fail) in a series of trials |

has an upper limit (e.g., tossing a coin two times and having two heads) | binomial distribution |

a family of right-skewed distributions (discrete) | poisson distribution |

similar to binomial distribution, but does not have an upper limit | poisson distribution |

e.g., the number of accidents at an intersection in a time period | poisson distribution |

what is mean and SD equal to in standard normal distribution? | mean is 0 and SD 1 |

z score in standard normal distribution | value on the x-axis of a standard normal distribution. The numerical value specifies the distance or standard deviation of a value from the mean |

distributions tend to be close to normal distribution as the number of samples increases | Central limit theorem |

With a small probability of success (p = .20), the binomial distribution is skewed | right |

With a constant probability of success (p = .50), the binomial distribution is | symmetric |

With a large probability of success (p = .80), the binomial distribution is skewed | left |

## ANOVA

Question | Answer |
---|---|

measure of a characteristic of a population | Parameter |

Parametric test uses what type of variables | ratio or interval variables |

t test, Pearson correlation, analysis of variance (ANOVA) | parametric |

Nonparametric test uses what type of variables | ordinal or nominal variables |

chi square | nonparametric |

ANOVA is used to analyze differences between group | means |

ANOVA IV | categorical |

ANOVA DV | continous |

Effect Size for One-Way ANOVA | Proportion of variance where you see how independent variable (IV) has affected a dependent variable (DV), how much variance in DV (e.g., weight loss) can be accounted for by IV (e.g., weight loss program) |

How can you measure effect size? | η² (eta squared) & R2 (r squared) |

Two or more categorical IVs (main effects), One continuous DV | Two-Way ANOVA |

## T test

Question | Answer |
---|---|

T test used when the population variance is | unknown |

population variance | average squared distance from the population mean |

if your df is between two dfs | round down to the next lowest df |

in one sample T if Tobs is greater than Tcrit than | reject null |

Compares two independent sample means | Independent t test |

df for independent t test | df=(n1+n2)-2 |

two dependent T tests? | Repeated-measures and matched pair |

same participants are tested repeatedly on the same variable | Repeated-measures t test |

Matched pairs t test (paired t test) | two groups (different participants) are matched on one or more characteristics (e.g., matched on GPA and age) |

## Type I & II Error, Power, Confidence Interval & Sample Size

Question | Answer |
---|---|

The probability of rejecting H0 when H0 is true | Type I Error |

The probability of type I error is | α (significance level of a test) |

if α = .05 and we reject H0, there is | a 5% probability that we commit a type I error |

The probability of accepting H0 when H1 is true | Type II Error |

The probability of type II error is | β |

What error is more serious | type I |

Statistical Power | Probability of rejecting H0 when H0 is false, detecting a real difference or relationship |

factors that increase power | effect size and sample size |

factors that decrease power | SD and standard error |

A statistical procedure in which a sample statistic is used to estimate the value of an unknown population parameter | estimation |

Point estimation | the use of a sample statistic to estimate a population parameter (e.g., a population mean) |

## Pages linking here (main versions and versions by same user)

No other pages link to this page. See Linking Quickstart for more info.