asaplib.kde package¶

Submodules¶

asaplib.kde.density_estimation module¶

class and methods for performing kernel density estimation

class asaplib.kde.density_estimation.KDE_scipy(bw_method=None)[source]¶

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html

evaluate_density(X)[source]¶

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns: Log of densities for every point
Return type: array, shape(n_sample)

fit(X)[source]¶

X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

Note that scipy.stats.gaussian_kde take X with shape (# of dimension, # of data) This is why we transpose the input X.

class asaplib.kde.density_estimation.KDE_sklearn(bandwidth=1.0, algorithm='auto', kernel='gaussian', metric='euclidean')[source]¶

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel Density Estimation with Sklearn https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity https://scikit-learn.org/stable/modules/density.html#kernel-density-estimation

evaluate_density(X)[source]¶

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns: Log of densities for every point
Return type: array, shape(n_sample)

fit(X)[source]¶

X: dataset, array_like

Datapoints to estimate from. In case of univariate data this is a 1-D array, otherwise a 2-D array with shape (# of data, # of dimension)

class asaplib.kde.density_estimation.Kernel_Density_Base[source]¶

Bases: object

Base class for performing kernel density estimation

evaluate_density(X)[source]¶

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns: Log of densities for every point
Return type: array, shape(n_sample)

fit(X)[source]¶: Fit kernel model to X

fit_evaluate_density(X)[source]¶

get_acronym()[source]¶

asaplib.kde.density_estimation_internal module¶

class asaplib.kde.density_estimation_internal.KDE_internal(nh_size='auto', bandwidth=None, test_ratio_size=0.1, xtol=0.01, atol=5e-06, rtol=5e-05, extreme_dist=False, nn_dist=None, kernel='gaussian')[source]¶

Bases: asaplib.kde.density_estimation.Kernel_Density_Base

Kernel density estimation (KDE) for accurate local density estimation. This is achieved by using maximum-likelihood estimation of the generative kernel density model which is regularized using cross-validation.

Parameters

bandwidth (float, optional) – bandwidth for the kernel density estimation. If not specified, will be determined automatically using maximum likelihood on a test-set.
nh_size (int, optional (default = 'auto')) – number of points in a typical neighborhood… only relevant for evaluating a crude estimate of the bandwidth.’auto’ means that the nh_size is scaled with number of samples. We use nh_size = 100 for 10000 samples. The minimum neighborhood size is set to 4.
test_ratio_size (float, optional) – Ratio size of the test set used when performing maximum likehood estimation. In order to have smooth density estimations (prevent overfitting), it is recommended to use a large test_ratio_size (closer to 1.0) rather than a small one.
atol (float, optional (default = 0.000005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision
rtol (float, optional (default = 0.00005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision
xtol (float, optional (default = 0.01)) – precision parameter for optimizing the bandwidth using maximum likelihood on a test set
test_ratio_size – ratio of the test size for determining the bandwidth.
kernel (str, optional (default='gaussian')) – Type of Kernel to use for density estimates. Other options are {‘epanechnikov’|’linear’,’tophat’}.

bandwidth_estimate(X_train, X_test)[source]¶

Gives a rough estimate of the optimal bandwidth (based on the notion of some effective neigborhood)

Returns: bandwidth estimate, minimum possible value
Return type: tuple, shape(2)

evaluate_density(X)[source]¶

Given an array of data, computes the local density of every point using kernel density estimation

Data X : array, shape(n_sample,n_feature)

Returns

Log of densities for every point (array, shape(n_sample))
Return – kde.score_samples(X)

find_optimal_bandwidth(X)[source]¶: Performs maximum likelihood estimation on a test set of the density model fitted on a training set

fit(X)[source]¶: Fit kernel model to X

log_likelihood_test_set(bandwidth, X_test)[source]¶: Fit the kde model on the training set given some bandwidth and evaluates the negative log-likelihood of the test set

asaplib.kde.density_estimation_internal.round_float(x)[source]¶: Rounds a float to it’s first significant digit

asaplib.kde package¶

Submodules¶

asaplib.kde.density_estimation module¶

asaplib.kde.density_estimation_internal module¶

Module contents¶