# STAT1 Basics

## Section

Question | Answer |
---|---|

Statistics | the art and science of collecting, analyzing, presenting and interpreting data |

Data | the facts and figures collected, analyzed and summarized for presentation and interpretation |

Data Set | all the data collected in a particular study |

Elements | the entities on which data are collected |

Variable | characteristic of interest for the elements |

Observation | the set of measurements obtained for a particular element |

Nominal Scale | the scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. It can be numeric or non-numeric. |

Ordinal Scale | The scale of measurement for a variable if the data exhibit the properties of nominal data and order or rank of the data is meaningful. |

Interval Scale | The scale of measurement for a variable if the data demonstrate the properties or ordinal and interval between values expressed in terms of a fixed unit of measure. Always numeric. |

Ratio Scale | The scale of measurement for a variable if the data demonstrate all the properties of interval and the ratio of two values is meaningful. Always numeric. |

Categorical Data | labels or names used to identify an attribute. Either nominal or ordinal scale of measurement. |

Quantitative Data | numeric values that indicate how much or how many of something. Either interval or ratio scale of measurement. |

Cross Sectional Data | are data collected at the same (or approximately) point in time. |

Time Series Data | are data collected over several time periods. |

Descriptive Statistics | uses tabular, graphical, numerical summaries of data |

Statistical Inference | the process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population |

Population | is the set of all elements of interest in a particular study |

Sample | is a subset of the population |

Census | is a survey conducted on the entire population to collect data |

Sample Survey | is a survey conducted from the sample to collect data |

Primary Data | data collected by the investigator conducting the research. Original information for field research. |

Secondary Data | data collected by another person or different source for re-use in the purpose of research. |

Simple Random Sampling | basic method of sampling from a population randomly |

Systematic Random Sampling | method in which we randomly select one of the first k elements and the select every kth element thereafter. |

Stratified Sampling | method in which the population is first divided into strata and a simple random sample is then taken from each stratum. |

Cluster Sampling | method in which the population is first divided into clusters and then a simple random sample of the clusters is taken. |

Multi Stage Sampling | complex form of cluster sampling |

Convenience Sampling (Accidental) | members are chosen based on relative ease of access. Like friends, classmates, family, etc. |

Snowball Sampling | first respondent refers a friend then refers another and so on. |

Judgmental Sampling | the researcher choose the sample appropriate for study. |

Deviant Case | differ from dominant pattern |

Case Study | limited to one group |

Ad Hoc quotas | a quota is established |

## SUMMARIZING QUALITATIVE DATA

Question | Answer |
---|---|

Frequency distribution | tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes |

Relative frequency | fraction or proportion of the total number of data items belonging to the class |

Percent frequency | relative frequency multiplied by 100 |

Bar Graph | graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution |

## SUMMARIZING QUANTITATIVE DATA

Question | Answer |
---|---|

Histogram | a bar graph with no natural separation between rectangles of adjacent classes |

Cumulative Frequency Distribution | number of items with values less than or equal to the upper limit of each class |

cumulative relative frequency distribution | shows the proportion of items with values less than or equal to the upper limit of each class |

cumulative percent frequency distribution | shows the percentage of items with values less than or equal to the upper limit of each class |

Ogive | graph of a cumulative distribution |

exploratory data analysis | consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly |

stem-and-leaf display | shows both the rank order and shape of the distribution of the data |

Crosstabulation | tabular method for summarizing the data for two variables simultaneously |

scatter diagram | graphical presentation of the relationship between two quantitative variables |

## MEASURES OF LOCATION

Question | Answer |
---|---|

Mean | data set is the average of all the data values |

Median | value in the middle when the data items are arranged in ascending order |

Mode | value that occurs with greatest frequency |

Percentile | provides information about how the data are spread over the interval from the smallest value to the largest value |

## MEASURES OF VARIABILITY

Question | Answer |
---|---|

Range | data set is the difference between the largest and smallest data values |

Range | Simplest measure of variability |

Interquartile Range | difference between the third quartile and the first quartile |

Interquartile Range | Range for the middle 50% |

Variance | measure of variability that utilizes all the data |

Variance | Average of squared differences between data value and mean |

standard deviation | positive square root of the variance |

Coefficient of Variation | indicates how large the standard deviation is in relation to the mean |

Chebyshevâ€™s Theorem | At least (1 - 1/k 2 ) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. |

Empirical Rule | Approximately 68% of the data values will be within one standard deviation of the mean |

Outlier | unusually small or unusually large value in a data set |

Smallest, First Quartile, Median, Third Quartile, Largest | Five-Number Summary |

Box Plot | box is drawn with its ends located at the first and third quartiles |

Covariance | measure of the linear association between two variables |

Weighted mean | mean is computed by giving each data value a weight that reflects its importance |

## PROBABILITY

Question | Answer |
---|---|

Probability | numerical measure of the likelihood that an event will occur. |

Experiment | is a process in statistics that generates well defined outcomes. |

Sample Space | is the set of all experimental outcomes. |

Sample Point or Experimental Outcome | is an element of the sample space. |

Event | is a collection of sample points or a subset of the sample space. |

Tree Diagram | is a graphical representation that helps in visualizing a multiple step experiment |

Counting Rule for Combinations | A second useful counting rule allows one to count the number of experimental outcomes when the experiment involves selecting n objects from a (usually larger) set of N objects. |

Classical Method | assigning probabilities based on the assumptions of equally likely outcomes. |

Relative Frequency Method | assigning probabilities based on experimentation or historical data. |

Subjective Method | assigning probabilities based on judgment. |

Event | a collection of sample points |

Mutually Exclusive Events | the events have no sample points in common |

Conditional Probability | probability of an event given that another event has occurred |

Multiplication law | provides a way to compute the probability of the intersection of two events |

