PetscProbComputeKSStatisticWeighted#

Compute the Kolmogorov-Smirnov statistic for the weighted empirical distribution for an input vector, compared to an analytic CDF.

Synopsis#

#include "petscdt.h" 
PetscErrorCode PetscProbComputeKSStatisticWeighted(Vec v, Vec w, PetscProbFunc cdf, PetscReal *alpha)

Collective

Input Parameters#

v - The data vector, blocksize is the sample dimension
w - The vector of weights for each sample, instead of the default 1/n
cdf - The analytic CDF

Output Parameter#

alpha - The KS statistic

Notes#

The Kolmogorov-Smirnov statistic for a given cumulative distribution function $F(x)$ is

D_n = \sup_x \left| F_n(x) - F(x) \right|

where $\sup_x$ is the supremum of the set of distances, and the empirical distribution function $F_n(x)$ is discrete, and given by

\[ F_n = # of samples <= x / n \]

The empirical distribution function $F_n(x)$ is discrete, and thus had a ``stairstep’’ cumulative distribution, making $n$ the number of stairs. Intuitively, the statistic takes the largest absolute difference between the two distribution functions across all $x$ values.

The goodness-of-fit test, or Kolmogorov-Smirnov test, is constructed using the Kolmogorov distribution. It rejects the null hypothesis at level $\alpha$ if

\sqrt{n} D_{n} > K_{\alpha},

where $K_\alpha$ is found from

\operatorname{Pr}(K \leq K_{\alpha}) = 1 - \alpha.

This means that getting a small alpha says that we have high confidence that the data did not come from the input distribution, so we say that it rejects the null hypothesis.

Level#

advanced

Location#

src/dm/dt/interface/dtprob.c

Index of all DT routines
Table of Contents for all manual pages
Index of all manual pages