asaplib.kde package

Submodules

asaplib.kde.density_estimation module

class and methods for performing kernel density estimation

class asaplib.kde.density_estimation.KDE_scipy(bw_method=None)[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html

evaluate_density(X)[source]

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns

Log of densities for every point

Return type

array, shape(n_sample)

fit(X)[source]

X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

Note that scipy.stats.gaussian_kde take X with shape (# of dimension, # of data) This is why we transpose the input X.

class asaplib.kde.density_estimation.KDE_sklearn(bandwidth=1.0, algorithm='auto', kernel='gaussian', metric='euclidean')[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Sklearn https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity https://scikit-learn.org/stable/modules/density.html#kernel-density-estimation

evaluate_density(X)[source]

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns

Log of densities for every point

Return type

array, shape(n_sample)

fit(X)[source]

X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

class asaplib.kde.density_estimation.Kernel_Density_Base[source]

Bases: object

Base class for performing kernel density estimation

evaluate_density(X)[source]

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns

Log of densities for every point

Return type

array, shape(n_sample)

fit(X)[source]

Fit kernel model to X

fit_evaluate_density(X)[source]
get_acronym()[source]

asaplib.kde.density_estimation_internal module

I adapted the code from: https://github.com/alexandreday/fast_density_clustering.git Copyright 2017 Alexandre Day

class asaplib.kde.density_estimation_internal.KDE_internal(nh_size='auto', bandwidth=None, test_ratio_size=0.1, xtol=0.01, atol=5e-06, rtol=5e-05, extreme_dist=False, nn_dist=None, kernel='gaussian')[source]

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel density estimation (KDE) for accurate local density estimation. This is achieved by using maximum-likelihood estimation of the generative kernel density model which is regularized using cross-validation.

Parameters
  • bandwidth (float, optional) – bandwidth for the kernel density estimation. If not specified, will be determined automatically using maximum likelihood on a test-set.

  • nh_size (int, optional (default = 'auto')) – number of points in a typical neighborhood… only relevant for evaluating a crude estimate of the bandwidth.’auto’ means that the nh_size is scaled with number of samples. We use nh_size = 100 for 10000 samples. The minimum neighborhood size is set to 4.

  • test_ratio_size (float, optional) – Ratio size of the test set used when performing maximum likehood estimation. In order to have smooth density estimations (prevent overfitting), it is recommended to use a large test_ratio_size (closer to 1.0) rather than a small one.

  • atol (float, optional (default = 0.000005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision

  • rtol (float, optional (default = 0.00005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision

  • xtol (float, optional (default = 0.01)) – precision parameter for optimizing the bandwidth using maximum likelihood on a test set

  • test_ratio_size – ratio of the test size for determining the bandwidth.

  • kernel (str, optional (default='gaussian')) – Type of Kernel to use for density estimates. Other options are {‘epanechnikov’|’linear’,’tophat’}.

bandwidth_estimate(X_train, X_test)[source]

Gives a rough estimate of the optimal bandwidth (based on the notion of some effective neigborhood)

Returns

bandwidth estimate, minimum possible value

Return type

tuple, shape(2)

evaluate_density(X)[source]

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns

  • Log of densities for every point (array, shape(n_sample))

  • Return – kde.score_samples(X)

find_optimal_bandwidth(X)[source]

Performs maximum likelihood estimation on a test set of the density model fitted on a training set

fit(X)[source]

Fit kernel model to X

log_likelihood_test_set(bandwidth, X_test)[source]

Fit the kde model on the training set given some bandwidth and evaluates the negative log-likelihood of the test set

asaplib.kde.density_estimation_internal.round_float(x)[source]

Rounds a float to it’s first significant digit

Module contents