The t-SNE method maps points from a high-dimensional space, {x1, x2,..., xn}, to points in a low-dimensional space, {y1, y2,..., yn} by minimizing the difference between the high-dimensional similarities of {xi, xj} and the low-dimensional similarities of {yi, yj}. The pairwise similarities are represented as probability distributions. In the high-dimensional space, conditional probabilities, pj|i, are calculated using the Gaussian distribution. The Multivariate Embedding platform provides two methods to calculate the conditional probabilities.
If the Sparse option is selected in the launch window, pj|i are calculated using a sparse approximation. For each of the n inputs, a set of nearest neighbors is found using a vantage-point (VP) tree. Then, the conditional probabilities are calculated only for those subsets of nearest neighbors:
In this equation, Ni is the set of the floor(3p) nearest neighbors of xi, where p is the perplexity parameter defined in the launch window. The variance of the Gaussian distribution, σi, is also based on the perplexity parameter. See van der Maaten and Hinton (2008) and van der Maaten (2014).
If the Sparse option is not selected in the launch window, pj|i are calculated for all points:
In this calculation, the variance of the Gaussian distribution, σi, is also based on the perplexity parameter.
In the t-SNE method, it is assumed that the conditional probabilities are symmetric. Therefore, the joint probabilities, pij, in the high-dimensional space are defined by the symmetrical conditional similarities:
where pij = pji for all i and j. Since it is the pairwise similarities that are of interest, it is also assumed that pii = 0.
The joint probabilities in the low-dimensional mapping, qij, are calculated using the Student’s t distribution with one degree of freedom:
These probabilities have the same properties as the pij’s, meaning that qij = qji for all i and j and qii = 0.
The t-SNE method minimizes the difference between the pairwise similarities in the high-dimensional space and the pairwise similarities in the low-dimensional space by minimizing a single Kullback-Leibler divergence between the joint probability distribution P and the joint probability distribution Q. The Kullback-Leibler divergence between P and Q is calculated as follows: