Skip to article frontmatterSkip to article content

The conceptual definition

Kernel Density Estimation (KDE) is a non-parametric technique for estimating the probability density function---the kernel function---of a continuous variable. KDE is widely used in various fields, such as statistical analysis, machine learning, and data visualization. The main idea behind KDE is to represent the underlying probability distribution of a dataset by summing up the influence of individual data points.

Key properties

Probability Density Function

Remember that we have talked about this in an earlier chapter?

Remember that we have talked about this in an earlier chapter?

Limitations of Parametric Density Estimation Approaches

The Need for a Flexible and Data-Driven Approach like KDE

In simple words, KDE is a data-driven way to approximate the PDF of data.

Kernel Functions and their properties

A kernel function is a mathematical function used in KDE to assign weights to data points based on their distances from a location of interest.

Kernel functions are characterized by several properties:

How KDE works

Approach one (from point to space):

  1. For every data point, draw a kernel, i.e., the red dashed lines in left figure.

  2. Stack the kernels if they overlapped, i.e., the blue solid line.

Approach two (from space to point):

  1. For every space unit (x-axis), draw a kernel.

  2. Identify data points that fall within the kernel range.

  3. Measure the distance from data point to the space unit.

  4. Convert the distance to weight using kernel function.

  5. Sum the weight through all identified points in step 2.

A histogram and KDE plot. Image source: Wikipedia

A histogram and KDE plot. Image source: Wikipedia

How KDE work: by summing the density value.

How KDE work: by summing the density value.

Common Kernel Functions

Gaussian (normal) kernel

The Gaussian kernel follows a bell-shaped curve and is widely used due to its smoothness and differentiability. Its formula is given by:

K(x)=1h×2π×e0.5(x/h)2K(x) = \frac{1}{h\times \sqrt{2\pi}} \times e^{-0.5(x/h)^2}
Gaussian Kernel, bandwidth (h) set as 1.

Gaussian Kernel, bandwidth (h) set as 1.

Epanechnikov (parabolic) kernel

The Epanechnikov kernel is optimal for some statistical properties and has a compact support, meaning it assigns zero weight to points outside a specific range. Its formula is:

K(x)={34h×(1(x/h)2)if -h  x  h0otherwiseK(x) = \begin{cases} \frac{3}{4h} \times (1 - (x / h)^2) & \text{if -h }\leq\text{ x }\leq\text{ h} \\\\ 0 & \text{otherwise} \\ \end{cases}
Epanechnikov Kernel, bandwidth (h) set as 1.

Epanechnikov Kernel, bandwidth (h) set as 1.

Quartic (biweight) kernel

The Quartic kernel function is another commonly used kernel function in Kernel Density Estimation (KDE). It is defined as follows:

K(x)={1516h×(1(x/h)2)2if -h  x  h0otherwiseK(x) = \begin{cases} \frac{15}{16h} \times (1 - (x / h)^2)^2 & \text{if -h }\leq\text{ x }\leq\text{ h} \\\\ 0 & \text{otherwise} \\ \end{cases}
Quartic Kernel, bandwidth (h) set as 1.

Quartic Kernel, bandwidth (h) set as 1.

Triangular kernel

The Triangular kernel function is a simple and computationally efficient kernel that assigns weights to points based on a triangular shape. It is defined as:

K(x)={1x/hhif -h  x  h0otherwiseK(x) = \begin{cases} \frac{1 - |x/h|}{h} & \text{if -h }\leq\text{ x }\leq\text{ h} \\\\ 0 & \text{otherwise} \\ \end{cases}
Triangular Kernel, bandwidth (h) set as 1.

Triangular Kernel, bandwidth (h) set as 1.

Uniform (Tophat) kernel

The uniform kernel assigns equal weights to points within a specific range and zero weight outside that range. Its formula is:

K(x)={12hif -h  x  h0otherwiseK(x) = \begin{cases} \frac{1}{2h} & \text{if -h }\leq\text{ x }\leq\text{ h} \\\\ 0 & \text{otherwise} \\ \end{cases}
Uniform Kernel, bandwidth (h) set as 1.

Uniform Kernel, bandwidth (h) set as 1.

The effect of Bandwidth

Effect of bandwidth (h) on Kernel Function (weighting).

Effect of bandwidth (h) on Kernel Function (weighting).

Effect of bandwidth (h) on 1-D KDE results.

Effect of bandwidth (h) on 1-D KDE results.

Bandwidth Selection

Rule of Thumb

h1.06×σ×n1/5h \approx 1.06 \times \sigma \times n^{-1/5},

where σ\sigma is the standard deviation of the data, and nn is the number of data points.

Cross-Validation

5-fold cross-validation approach.

5-fold cross-validation approach.

Plug-in Methods

Visual Inspection

Remarks

Advantages of KDE

Disadvantages/ Limitations of KDE

KDE: Visualizing Data Distribution

Line fill plot.

Line fill plot.

Contour lines.

Contour lines.

Contour lines on Facet Grid with single KDE line on the diagonal.

Contour lines on Facet Grid with single KDE line on the diagonal.

Contour lines on Facet Grid with the histograme of single vector on the diagonal.

Contour lines on Facet Grid with the histograme of single vector on the diagonal.

Advanced Topics

Kernel Density Estimation (KDE) has several advanced topics that can be explored for a deeper understanding and more effective application. Here are some of these topics:

  1. Adaptive Bandwidth Selection: This involves using different bandwidths for different regions of the data space, based on local density or other data characteristics. It can improve KDE performance for datasets with varying density and complex structures.

  2. Multivariate KDE: Extending KDE to multivariate data involves selecting appropriate kernel functions and bandwidths in multiple dimensions. Handling the curse of dimensionality and the increased computational complexity are important considerations.

  3. Kernel Choice: Different kernel functions can yield different KDE results, and understanding their properties, such as order, continuity, and bias, can guide kernel selection for specific applications.

  4. Density Derivatives: Estimating density derivatives (e.g., gradient, Hessian) using KDE can provide additional insights into the data, such as local maxima and saddle points. This is particularly useful in optimization and feature detection tasks.

  5. Kernel Smoothing in Regression: KDE can be used for nonparametric regression by locally smoothing the data using kernel functions. Nadaraya-Watson and Local Linear Regression are examples of such methods.

  6. Advanced KDE Algorithms: Techniques like Fast Fourier Transform (FFT) or K-D Trees can _speed up] KDE computations for large datasets, while advanced KDE methods like Variable Kernel Density Estimation (VKDE) or Bayesian KDE can handle complex data structures and uncertainty in the density estimates.

Summary