asaplib.kde package


asaplib.kde.density_estimation module

class and methods for performing kernel density estimation

class asaplib.kde.density_estimation.KDE_scipy(bw_method=None)[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Scipy


Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)


Log of densities for every point

Return type

array, shape(n_sample)


X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

Note that scipy.stats.gaussian_kde take X with shape (# of dimension, # of data) This is why we transpose the input X.

class asaplib.kde.density_estimation.KDE_sklearn(bandwidth=1.0, algorithm='auto', kernel='gaussian', metric='euclidean')[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Sklearn


Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)


Log of densities for every point

Return type

array, shape(n_sample)


X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

class asaplib.kde.density_estimation.Kernel_Density_Base[source]

Bases: object

Base class for performing kernel density estimation


Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)


Log of densities for every point

Return type

array, shape(n_sample)


Fit kernel model to X


asaplib.kde.density_estimation_internal module

I adapted the code from: Copyright 2017 Alexandre Day

class asaplib.kde.density_estimation_internal.KDE_internal(nh_size='auto', bandwidth=None, test_ratio_size=0.1, xtol=0.01, atol=5e-06, rtol=5e-05, extreme_dist=False, nn_dist=None, kernel='gaussian')[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel density estimation (KDE) for accurate local density estimation. This is achieved by using maximum-likelihood estimation of the generative kernel density model which is regularized using cross-validation.

  • bandwidth (float, optional) – bandwidth for the kernel density estimation. If not specified, will be determined automatically using maximum likelihood on a test-set.

  • nh_size (int, optional (default = 'auto')) – number of points in a typical neighborhood… only relevant for evaluating a crude estimate of the bandwidth.’auto’ means that the nh_size is scaled with number of samples. We use nh_size = 100 for 10000 samples. The minimum neighborhood size is set to 4.

  • test_ratio_size (float, optional) – Ratio size of the test set used when performing maximum likehood estimation. In order to have smooth density estimations (prevent overfitting), it is recommended to use a large test_ratio_size (closer to 1.0) rather than a small one.

  • atol (float, optional (default = 0.000005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision

  • rtol (float, optional (default = 0.00005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision

  • xtol (float, optional (default = 0.01)) – precision parameter for optimizing the bandwidth using maximum likelihood on a test set

  • test_ratio_size – ratio of the test size for determining the bandwidth.

  • kernel (str, optional (default='gaussian')) – Type of Kernel to use for density estimates. Other options are {‘epanechnikov’|’linear’,’tophat’}.

bandwidth_estimate(X_train, X_test)[source]

Gives a rough estimate of the optimal bandwidth (based on the notion of some effective neigborhood)


bandwidth estimate, minimum possible value

Return type

tuple, shape(2)


Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)


  • Log of densities for every point (array, shape(n_sample))

  • Return – kde.score_samples(X)


Performs maximum likelihood estimation on a test set of the density model fitted on a training set


Fit kernel model to X

log_likelihood_test_set(bandwidth, X_test)[source]

Fit the kde model on the training set given some bandwidth and evaluates the negative log-likelihood of the test set


Rounds a float to it’s first significant digit

Module contents