PetscProbComputeKSStatistic#

Compute the Kolmogorov-Smirnov statistic for the empirical distribution for an input vector, compared to an analytic CDF.

Synopsis#

#include "petscdt.h" 
PetscErrorCode PetscProbComputeKSStatistic(Vec v, PetscProbFunc cdf, PetscReal *alpha)

Collective

Input Parameters#

  • v - The data vector, blocksize is the sample dimension

  • cdf - The analytic CDF

Output Parameter#

  • alpha - The KS statistic

Notes#

The Kolmogorov-Smirnov statistic for a given cumulative distribution function \(F(x)\) is

\[ D_n = \sup_x \left| F_n(x) - F(x) \right| \]

where \(\sup_x\) is the supremum of the set of distances, and the empirical distribution function \(F_n(x)\) is discrete, and given by

\[ F_n = # of samples <= x / n \]

The empirical distribution function \(F_n(x)\) is discrete, and thus had a ``stairstep’’ cumulative distribution, making \(n\) the number of stairs. Intuitively, the statistic takes the largest absolute difference between the two distribution functions across all \(x\) values.

The goodness-of-fit test, or Kolmogorov-Smirnov test, is constructed using the Kolmogorov distribution. It rejects the null hypothesis at level \(\alpha\) if

\[ \sqrt{n} D_{n} > K_{\alpha}, \]

where \(K_\alpha\) is found from

\[ \operatorname{Pr}(K \leq K_{\alpha}) = 1 - \alpha. \]

This means that getting a small alpha says that we have high confidence that the data did not come from the input distribution, so we say that it rejects the null hypothesis.

See Also#

PetscProbComputeKSStatisticWeighted(), PetscProbComputeKSStatisticMagnitude(), PetscProbFunc

Level#

advanced

Location#

src/dm/dt/interface/dtprob.c


Index of all DT routines
Table of Contents for all manual pages
Index of all manual pages