More Statistics

Key Points


References


Key Concepts


Parametric statistics

https://en.wikipedia.org/wiki/Parametric_statistics

Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modelled by a probability distribution that has a fixed set of parameters.[1] Conversely a non-parametric model differs precisely in that the parameter set (or feature set in machine learning) is not fixed and can increase, or even decrease, if new relevant information is collected.[2]

Most well-known statistical methods are parametric.[3] Regarding nonparametric (and semiparametric) models, Sir David Cox has said, "These typically involve fewer assumptions of structure and distributional form but usually contain strong assumptions about independencies".[4]


Non-parametric statistics

https://en.wikipedia.org/wiki/Category:Nonparametric_statistics

Nonparametric statistics is a branch of statistics concerned with non-parametric statistical models and non-parametric statistical tests. Non-parametric statistics are statistics that do not estimate population parameters. In contrast, see parametric statistics.

Nonparametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. Nonparametric models are therefore also called distribution free.

Nonparametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the frequency distributions of the variables being assessed.


Pearson's chi square test (goodness of fit)

https://en.wikipedia.org/wiki/Chi-squared_test

A chi-squared test, also written as χ2 test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, 'chi-squared test' often is used as short for Pearson's chi-squared test. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

In the standard applications of this test, the observations are classified into mutually exclusive classes, and there is some theory, or say null hypothesis, which gives the probability that any observation falls into the corresponding class. The purpose of the test is to evaluate how likely the observations that are made would be, assuming the null hypothesis is true.

Chi-squared tests are often constructed from a sum of squared errors, or through the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.

https://www.khanacademy.org/math/statistics-probability/inference-categorical-data-chi-square-tests/chi-square-goodness-of-fit-tests/v/pearson-s-chi-square-test-goodness-of-fit

do the observations for this experiment, fit the model we created?

testing - null hypothesis assumes no relationship exists in the model variables

set a significance level ( alpha ) of X% as the target discriminator ( 5% )

degrees of freedom = n - 1 ( data points )  ( dof = 5 )





Potential Value Opportunities



Potential Challenges



Candidate Solutions



Step-by-step guide for Example



sample code block

sample code block
 



Recommended Next Steps