Review — UNI-SNE: Visualizing Similarity Data with a Mixture of Maps

Introducing A Background Map to Solve the Crowding Problem

4 min readFeb 6, 2021

In this paper, Visualizing Similarity Data with a Mixture of Maps, UNI-SNE, by University of Toronto, is briefly reviewed since UNI-SNE is mentioned in t-SNE. This is a paper by Prof. Hinton. In this paper:

Aspect Maps are introduced for Data with a Mixture of Maps.
A Background Map is used to solve the crowding problem.

This is a paper in 2007 ICAIS with over 100 citations. (Sik-Ho Tsang @ Medium)

Outline

Brief Review of SNE & Symmetric SNE
Aspect Maps
UNI-SNE: A Background Map

1. Brief Review of SNE & Symmetric SNE

1.1. SNE

To visualize the high dimensional data, we need to map those data to a low dimensional space such as 2D or 3D space.
Additional to this, the structure of high dimensional data should be preserved after mapping to low dimensional space for proper visualization.

A spherical Gaussian distribution centered at xi defines a probability density at each of the other points.
When these densities are normalized, we get a probability distribution, Pi, over all of the other points that represents their similarity to i.

A circular Gaussian distribution centered at yi defines a probability density at each of the other points.
When these densities are normalized, we get a probability distribution over all of the other points that is our low dimensional model, Qi of the high-dimensional Pi.

For each object, i, we can associate a cost with a set of low-dimensional y locations by using the Kullback-Liebler divergence to measure how well the distribution Qi models the distribution Pi:

The above cost C can be differentiated and minimized by gradient descent.

1.2. Symmetric SNE

An alternative is to define a single joint distribution over all non-identical ordered pairs:

This leads to simpler derivatives and easier to optimize.

2. Aspect Maps

Different senses of a word occur in different maps.
e.g.: river and loan can both be close to bank without being at all close to each other.
Each object, i, has a mixing proportion πmi in each map, m, and the mixing proportions are constrained to add to 1.

(Symmetric SNE is not used here.)
(There is large passage for minimizing the cost using the above aspect maps version of qj|i. Please read the paper if interested.)

**2 of the 50 aspect maps for the word association data**. **Left**: Each map models a different sense of can. **Right**: Each map models a different sense of field.

The above figure shows the 2 of 50 aspect maps for “can” and “field” examples.

3. UNI-SNE: A Background Map

One of the aspect maps would keep all of the objects very close together, while the other aspect map would create widely separated clusters of objects.
The objects in the middle will be crushed together too closely, causing crowding problem.
A background map in which all of the objects are very close together gives all of the qj|i a small positive contribution.
Here, for UNI-SNE, symmetric SNE is used, and qij is:

Principal components analysis (PCA) is applied on all 60,000 MNIST training images first to reduce each 28×28 pixel image to a 30-dimensional vector.
Then, Symmetric SNE is applied to 5000 of these 30-dimensional vectors with an equal number from each class.
The above figure shows that the 10 digit classes are not well separated.
The above figure shows that Symmetric SNE is also unable to separate the clusters 4,7,9 and 3,5,8 and it does not cleanly separate the clusters for 0, 1, 2, and 6 from the rest of the data. (The numbers are shown below as reference)

Using UNI-SNE, with 0.2 of the total probability mass uniformly distributed between all pairs, the 10 digit classes are much well separated compared with Symmetric SNE.
Of course, later on, t-SNE is proposed which is better and more popular than UNI-SNE.

Reference

[2007 ICAIS] [UNI-SNE]
Visualizing Similarity Data with a Mixture of Maps

Data Visualization

2002 [SNE] 2007 [UNI-SNE] 2008 [t-SNE]