Publication date: 07/08/2024

Robust PCA Outliers

Use the Robust PCA Outliers method in the Explore Outliers platform to identify outlier cells in correlated multivariate data. This method is useful because many other multivariate approaches identify only the outlier rows. Before the method is applied to the data, you have the option to first center and scale the columns. The scaling factor is defined as follows:

max [Q(0.75) - Q(0.50), Q(0.50) - Q(0.25)] / [normalQuantile(0.75)]

where

Q(p) is the pth quantile

Note: If Q(0.75) or Q(0.25) are equal to the median, then more extreme quantiles are used until there is a non-zero range.

After the data are centered and scaled, the Robust PCA Outliers method performs a sequence of singular value decompositions and thresholding steps to decompose the data matrix. The data are decomposed into a low-rank matrix and a sparse matrix of residuals. The thresholding is done so that the residuals are either very large for outliers or very close to zero for non-outliers. The algorithm determines a matrix rank appropriate to capture the systematic variation without the outliers or small noise. Outliers that are not in the low-rank space are detected based on their residuals. See Candes et al (2009) and Lin et al (2013). If there are missing values, they are initially replaced with zeros after the centering and scaling steps. Then, after each singular value decomposition (SVD) iteration, the missing values are updated by their predicted values from the SVD.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).