Review — UNI-SNE: Visualizing Similarity Data with a Mixture of Maps
In this paper, Visualizing Similarity Data with a Mixture of Maps, UNI-SNE, by University of Toronto, is briefly reviewed since UNI-SNE is mentioned in t-SNE. This is a paper by Prof. Hinton. In this paper:
- Aspect Maps are introduced for Data with a Mixture of Maps.
- A Background Map is used to solve the crowding problem.
This is a paper in 2007 ICAIS with over 100 citations. (Sik-Ho Tsang @ Medium)
1. Brief Review of SNE & Symmetric SNE
- To visualize the high dimensional data, we need to map those data to a low dimensional space such as 2D or 3D space.
- Additional to this, the structure of high dimensional data should be preserved after mapping to low dimensional space for proper visualization.
- A spherical Gaussian distribution centered at xi defines a probability density at each of the other points.
- When these densities are normalized, we get a probability distribution, Pi, over all of the other points that represents their similarity to i.
- A circular Gaussian distribution centered at yi defines a probability density at each of the other points.
- When these densities are normalized, we get a probability distribution over all of the other points that is our low dimensional model, Qi of the high-dimensional Pi.
- For each object, i, we can associate a cost with a set of low-dimensional y locations by using the Kullback-Liebler divergence to measure how well the distribution Qi models the distribution Pi:
- The above cost C can be differentiated and minimized by gradient descent.
1.2. Symmetric SNE
- An alternative is to define a single joint distribution over all non-identical ordered pairs:
- This leads to simpler derivatives and easier to optimize.
2. Aspect Maps
- Different senses of a word occur in different maps.
- e.g.: river and loan can both be close to bank without being at all close to each other.
- Each object, i, has a mixing proportion πmi in each map, m, and the mixing proportions are constrained to add to 1.
- (Symmetric SNE is not used here.)
- (There is large passage for minimizing the cost using the above aspect maps version of qj|i. Please read the paper if interested.)
- The above figure shows the 2 of 50 aspect maps for “can” and “field” examples.
3. UNI-SNE: A Background Map
- One of the aspect maps would keep all of the objects very close together, while the other aspect map would create widely separated clusters of objects.
- The objects in the middle will be crushed together too closely, causing crowding problem.
- A background map in which all of the objects are very close together gives all of the qj|i a small positive contribution.
- Here, for UNI-SNE, symmetric SNE is used, and qij is:
- Principal components analysis (PCA) is applied on all 60,000 MNIST training images first to reduce each 28×28 pixel image to a 30-dimensional vector.
- Then, Symmetric SNE is applied to 5000 of these 30-dimensional vectors with an equal number from each class.
- The above figure shows that the 10 digit classes are not well separated.
- The above figure shows that Symmetric SNE is also unable to separate the clusters 4,7,9 and 3,5,8 and it does not cleanly separate the clusters for 0, 1, 2, and 6 from the rest of the data. (The numbers are shown below as reference)
- Using UNI-SNE, with 0.2 of the total probability mass uniformly distributed between all pairs, the 10 digit classes are much well separated compared with Symmetric SNE.
- Of course, later on, t-SNE is proposed which is better and more popular than UNI-SNE.