且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

`python`中的加权高斯核密度估计

更新时间:2023-02-27 11:10:09

Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

Implementation details

The weighted arithmetic mean is

The unbiased data covariance matrix is then given by

The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.