Statistical Details for the Gradient Descent Algorithm

For the latest version of JMP Help, visit JMP.com/help.

Multivariate Methods > Multivariate Embedding > Statistical Details for the Multivariate Embedding Platform > Statistical Details for the Gradient Descent Algorithm

Publication date: 05/05/2023

Statistical Details for the Gradient Descent Algorithm

The Kullback-Leibler divergence used in the t-SNE method is minimized using a gradient descent algorithm based on the Barnes-Hut approximation (van der Maaten, 2014). The algorithm uses the following notation:

X = {x1, x2,..., xn} the original high-dimensional data with n data points

p = the perplexity parameter

T = the number of iterations

η = the learning rate

tinflate = the number of iterations after which the momentum value changes

α(t) = the momentum at iteration t, where α(t) = 0.5 for t ≤ tinflate and α(t) = 0.8 otherwise

Y(t) = {y1, y2,..., yn} the low-dimensional mapping solution at iteration t

The steps of the gradient descent algorithm are defined as follows:

1. Compute pj|i, the pairwise similarities in the high-dimensional space. Choose σi based on the specified perplexity p.

2. Calculate pij.

3. Set the initial solution Y(0) = {y1, y2,..., yn}, which is generated from a normal distribution with mean 0 and standard deviation 10-4.

4. For t = 1 to T:

– Compute qij, the pairwise similarities in the low-dimensional mapping.

– Compute the cost function:

Equation shown here

– Compute the gradient function, using the Barnes-Hut approximation:

Equation shown here

– Update the solution:

Equation shown here

The algorithm stops when one of the following conditions is met:

• The maximum gradient value over i is less than the convergence criterion specified in the launch window.

• The maximum number of iterations, T, is reached.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).