asaplib.kde package¶
Submodules¶
asaplib.kde.density_estimation module¶
class and methods for performing kernel density estimation
-
class
asaplib.kde.density_estimation.
KDE_scipy
(bw_method=None)[source]¶ Bases:
asaplib.kde.density_estimation.Kernel_Density_Base
Kernel Density Estimation with Scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html
-
class
asaplib.kde.density_estimation.
KDE_sklearn
(bandwidth=1.0, algorithm='auto', kernel='gaussian', metric='euclidean')[source]¶ Bases:
asaplib.kde.density_estimation.Kernel_Density_Base
Kernel Density Estimation with Sklearn https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity https://scikit-learn.org/stable/modules/density.html#kernel-density-estimation
-
class
asaplib.kde.density_estimation.
Kernel_Density_Base
[source]¶ Bases:
object
Base class for performing kernel density estimation
asaplib.kde.density_estimation_internal module¶
I adapted the code from: https://github.com/alexandreday/fast_density_clustering.git Copyright 2017 Alexandre Day
-
class
asaplib.kde.density_estimation_internal.
KDE_internal
(nh_size='auto', bandwidth=None, test_ratio_size=0.1, xtol=0.01, atol=5e-06, rtol=5e-05, extreme_dist=False, nn_dist=None, kernel='gaussian')[source]¶ Bases:
asaplib.kde.density_estimation.Kernel_Density_Base
Kernel density estimation (KDE) for accurate local density estimation. This is achieved by using maximum-likelihood estimation of the generative kernel density model which is regularized using cross-validation.
- Parameters
bandwidth (float, optional) – bandwidth for the kernel density estimation. If not specified, will be determined automatically using maximum likelihood on a test-set.
nh_size (int, optional (default = 'auto')) – number of points in a typical neighborhood… only relevant for evaluating a crude estimate of the bandwidth.’auto’ means that the nh_size is scaled with number of samples. We use nh_size = 100 for 10000 samples. The minimum neighborhood size is set to 4.
test_ratio_size (float, optional) – Ratio size of the test set used when performing maximum likehood estimation. In order to have smooth density estimations (prevent overfitting), it is recommended to use a large test_ratio_size (closer to 1.0) rather than a small one.
atol (float, optional (default = 0.000005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision
rtol (float, optional (default = 0.00005)) – kernel density estimate precision parameter. determines the precision used for kde. smaller values leads to slower execution but better precision
xtol (float, optional (default = 0.01)) – precision parameter for optimizing the bandwidth using maximum likelihood on a test set
test_ratio_size – ratio of the test size for determining the bandwidth.
kernel (str, optional (default='gaussian')) – Type of Kernel to use for density estimates. Other options are {‘epanechnikov’|’linear’,’tophat’}.
-
bandwidth_estimate
(X_train, X_test)[source]¶ Gives a rough estimate of the optimal bandwidth (based on the notion of some effective neigborhood)
- Returns
bandwidth estimate, minimum possible value
- Return type
tuple, shape(2)
-
evaluate_density
(X)[source]¶ Given an array of data, computes the local density of every point using kernel density estimation
Data X : array, shape(n_sample,n_feature)
- Returns
Log of densities for every point (array, shape(n_sample))
Return – kde.score_samples(X)