Statistics has two aspects: algorithms and inference.

Statistical inference is a system of mathematical logic for guidance and correction (or justification). Classical inference methodology: frequentist, Bayesian, and Fisherian.

Principles of Statistical Inference: Sufficiency principle, Conditionality principle, Likelihood principle.

## Core Concepts

The fundamental construct in probability is random variable; the fundamental construct in statistics is random sample.

### Model

In statistics, a model is a probability distribution of one or more variables: univariate models; regression models;

Parametric and nonparametric methods do not have essential difference between or superiority to each other: both are collections of models and take random samples as the sole input for estimation (frequentist). Parametric methods are algorithms selecting a unique probability model from a subspace of probability models, indexed by model parameter. Nonparametric methods are algorithms selecting a unique probability model from another subspace of probability models, only without an index. Generally, nonparametric methods are non-mechanistic methods, which are statistical in essence.

### Sample

Random sample is a sampling process from a hypothetical population. Traditional statistics assumes "large n, small p" ($n$ for observations, $p$ for parameters measured.) While in modern statistics, the problem typically is "small n, large p".

## Estimation

Point Estimation: methods of finding and evaluating estimators, UMVU estimators;

Interval Estimation: confidence interval, tolerance interval;

regression: Least-squares, lasso, ridge

## Hypothesis Testing

Likelihood Ratio Test (LRT), Uniformly Most Powerful (UMP) Test

False discovery rate (FDR)

## Miscellaneous Topics

Asymptotic Analysis:

Statistical learning is the attempt to explain techniques of learning from data in a statistical framework.

prediction, explanation

Before Fisher, statisticians didn’t really understand estimation. The same can be said now about prediction. {CASI2017}

## Reference

Notes on Intuitive Biostatistics {Motulsky1995}

Table 1: Statistical Techniques

Purpose Continuous Data Count or Ranked Data Arrival Time Binary Data
(Examples) (Height) (Number of headaches in a week; Self-report score) (Life expectancy of a patient; Minutes until REM sleep begins Recurrence of infection)
Describe one sample Frequency distribution; Sample mean; Quantiles; Sample standard deviation Frequency distribution; Quantiles; Kaplan-Meier survival curve; Median survival curve; Five-year survival percentage Proportion
Distributional Test Normality tests; Outlier tests N/A N/A N/A
Infer about one population One-sample t test Wilcoxon’s rank-sum test Confidence bands around survival curve; CI of median survival CI of proportion; Binomial test to compare observed distribution with a theoretical (expected) distribution
Compare two unpaired groups Unpaired t test Mann-Whitney test Log-rank test; Gehan-Breslow test; CI of ratio of median survival times; CI of hazard ratio Fisher’s exact test;
Compare two paired groups Paired t test Wilcoxon’s matched paires test Conditional proportional hazards regression McNemar’s test
Compare three or more unpaired groups One-way ANOVA followed by multiple comparison tests Kruskal-Wallis test; Dunn’s posttest Log-rank test; Gehan-Breslow test Chi-squared test (for trend)
Compare three or more paired groups Repeated-measures ANOVA followed by multiple comparison tests Friedman’s test; Dunn’s posttest Conditional proportional hazards regression Cochran’s Q
Quantify association between two variables Pearson’s correlation Spearman’s correlation N/A N/A
Predict one variable from one or several others linear/nonlinear regression N/A Cox’s proportional hazards regression Logistic regression