iPCA: Interactive Principal Component Analaysis

iPCA (interactive Principal Component Analaysis) is a system, with which uses are interactively analyze data. It uses Principle Component Analysis (PCA) that is a widely used mathematical technique in many fields for factor and trend analysis, dimension reduction, etc. However, it is often considered to be a "black box" operation whose results are difficult to interpret and sometimes counter-intuitive to the user. In order to assist the user in better understanding and utilizing PCA, the system has been developed by visualizing the results of principal component analysis using multiple coordinated views and a rich set of user interactions. Our design philosophy is to support analysis of multivariate datasets through extensive interaction with the PCA output.

It consists of four views (A ~ D) and two control panels (E ~ F).

Projection View(A): Two principal components (by default, the first and second most dominant eigenvectors) are used to project data points onto a two-dimensional coordinate system.

Eigenvector View(B): In the Eigenvector View, data points are shown in the eigenspace. The calculated eigenvectors and their eigenvalues are displayed in a vertically projected parallel coordinates visualization, with eigenvectors ranked from top to bottom by dominance. The distances between eigenvectors in the parallel coordinate view vary based on their eigenvalues, separating the eigenvectors based on their mathematical weights.

Data view(C): The Data View is located below the Projection View, and shows a parallel coordinates visualization of all data points in the original data dimensions. In this view, an auto-scaling function is applied to increase the readibility of data.

Correlation View(D): Pearson-correlation coefficients and relationships between variables are represented in the Correlation View as a matrix of scatter plots and values. Since correlations between dimensions are symmetric, repetition is avoided by separating the matrix into three components: the diagonal, the bottom triangle, and the top triangle. The diagonal displays the name of the dimension as a text string. The bottom triangle shows the coefficient value between two dimensions with a color indicating positive (red), neutral (white), and negative (blue) correlations. The top triangle contains cells of scatter plots in which all data items are projected onto the two intersecting dimensions. The colors of the data items are the same as the colors used in the other three views so that clusters are easily identified.


Here are some screenshots.

E.Coli dataset

A default screen when the E.Coli dataset is loaded.

Wine dataset

A default screen when the Wine dataset is loaded.

Wine dataset

Swapping the Projection view to the Correlation view.

A simple description (how to use)

Here is a short explanation how to use the system.

Data loading - Click "Browse..." button, and select the data you want to load. Then click the "Start Loading" button located below of the "Browse..." button.
Navigation (Only works in the Projection view) - Zooming (Mouse left button pressing - Zooming In/ Mouse right button pressing - Zooming Out)
- Panning (Mouse middle button pressing & move your mouse)
Item selection (Allows in all views) - Single item selection (Ctrl + Mouse left button clicking): Useful for selecting an individual item.
- Range selection (Alt + Mouse left button pressing + Creating a region boundary): Useful for selecting multiple items.
Changing the pre-selected principal components. By default, the first principal component (PC1) and the second principal component (PC2) are mapped with X-axis and Y-axis in the Projection view, correspondingly.
In the options (located next to the Eigenvector view), pre-selected options of PC1 and PC2 can be changed by the user. If you want to change the selection of PC2 to PC3, first click the PC2 check box and click the PC3 check box. Based on the principal component changes, you might see the changed visual representation in the Projection view - PC1 (X-axis) and PC3 (Y-Axis).

Input file format

#0303# [file_format_indicator]
150 4 [row col]
class Sepal_Length Sepal_Width Petal_Length Petal_Width [class_indicator dimension_variables]
setosa 0.556 -0.250 0.864 0.917 [class_info values_of_each_item]
setosa 0.667 0.167 0.864 0.917
setosa 0.778 0.000 0.898 0.917
setosa 0.833 0.083 0.831 0.917
setosa 0.611 -0.333 0.864 0.917
virginica 0.111 0.167 -0.390 -0.417

- Variable or class name should have no space within. If variable or class names in the dataset include space character, it needs to be replace with other characters such as underline character("_") or else.

- Categorical variables are not allowed in iPCA. If you have categorical variables, you should change it to numerical form. Other sample datasets can be get from UCI machine learning repository.

- Sample datasets Iris, Ecoli, Wine

Published Papers

  • Dong Hyun Jeong, Soo-Yeon Ji, Evan A Suma, Byunggu Yu and Remco Chang, Designing a collaborative visual analytics system to support users' continuous analytical processes, Human-centric Computing and Information Sciences, Volume 5, Issue 1, Springer, 2015 LINK.

  • Dong Hyun Jeong, Caroline Ziemkiewicz, William Ribarsky, and Remco Chang, Understanding Principal Component Analysis Using a Visual Analytics Tool, Technical Report, UNC Charlotte 2009. [PDF] (Presented at UKC 2009 - Mathematics: Fundamentals and Applications).

  • Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, Remco Chang, iPCA: An Interactive System for PCA-based Visual Analytics, Computer Graphics Forum (Eurovis 2009). pp. 767-774, 2009. [PDF]

Available Systems

Please unzip after download.
iPCA_V1.5.0.6.msi.zip Windows Installer
iPCA_V1. Windows executable files
iPCA_V1. Source codes designed with VC++ 8 and 9