# Review — UNI-SNE: Visualizing Similarity Data with a Mixture of Maps

## Introducing A Background Map to Solve the Crowding Problem

In this paper, Visualizing Similarity Data with a Mixture of Maps, UNI-SNE, by University of Toronto, is briefly reviewed since UNI-SNE is mentioned in t-SNE. This is a paper by Prof. Hinton. In this paper:

• Aspect Maps are introduced for Data with a Mixture of Maps.
• A Background Map is used to solve the crowding problem.

This is a paper in 2007 ICAIS with over 100 citations. (Sik-Ho Tsang @ Medium)

# Outline

1. Brief Review of SNE & Symmetric SNE
2. Aspect Maps
3. UNI-SNE: A Background Map

# 1. Brief Review of SNE& Symmetric SNE

## 1.1. SNE

• To visualize the high dimensional data, we need to map those data to a low dimensional space such as 2D or 3D space.
• Additional to this, the structure of high dimensional data should be preserved after mapping to low dimensional space for proper visualization.
• A spherical Gaussian distribution centered at xi defines a probability density at each of the other points.
• When these densities are normalized, we get a probability distribution, Pi, over all of the other points that represents their similarity to i.
• A circular Gaussian distribution centered at yi defines a probability density at each of the other points.
• When these densities are normalized, we get a probability distribution over all of the other points that is our low dimensional model, Qi of the high-dimensional Pi.
• For each object, i, we can associate a cost with a set of low-dimensional y locations by using the Kullback-Liebler divergence to measure how well the distribution Qi models the distribution Pi:
• The above cost C can be differentiated and minimized by gradient descent.

## 1.2. Symmetric SNE

• An alternative is to define a single joint distribution over all non-identical ordered pairs:
• This leads to simpler derivatives and easier to optimize.

# 2. Aspect Maps

• Different senses of a word occur in different maps.
• e.g.: river and loan can both be close to bank without being at all close to each other.
• Each object, i, has a mixing proportion πmi in each map, m, and the mixing proportions are constrained to add to 1.
• (Symmetric SNE is not used here.)
• (There is large passage for minimizing the cost using the above aspect maps version of qj|i. Please read the paper if interested.) 2 of the 50 aspect maps for the word association data. Left: Each map models a different sense of can. Right: Each map models a different sense of field.
• The above figure shows the 2 of 50 aspect maps for “can” and “field” examples.

# 3. UNI-SNE: A Background Map

• One of the aspect maps would keep all of the objects very close together, while the other aspect map would create widely separated clusters of objects.
• The objects in the middle will be crushed together too closely, causing crowding problem.
• A background map in which all of the objects are very close together gives all of the qj|i a small positive contribution.
• Here, for UNI-SNE, symmetric SNE is used, and qij is:
• Principal components analysis (PCA) is applied on all 60,000 MNIST training images first to reduce each 28×28 pixel image to a 30-dimensional vector.
• Then, Symmetric SNE is applied to 5000 of these 30-dimensional vectors with an equal number from each class.
• The above figure shows that the 10 digit classes are not well separated.
• The above figure shows that Symmetric SNE is also unable to separate the clusters 4,7,9 and 3,5,8 and it does not cleanly separate the clusters for 0, 1, 2, and 6 from the rest of the data. (The numbers are shown below as reference)