Multivariate Inliers and Outliers

This process calculates Mahalanobis distance based on available data to identify subject inliers and outliers in multivariate space from the multivariate mean . It also generates results by site to see which sites are extreme in this multivariate space.

Mahalanobis distance is plotted on the log scale to allow for easier examination of small scores. The reference line is derived from a transformation of the mean of the approximate chi-square distribution .

This process attempts to use as much data as possible. Along with sex and age, it takes all findings test codes by visit number and time number (if available), as well as frequencies of all event and intervention codes per subject. Of course, doing so can lead to missing data particularly for studies that do not appear to have a fixed number of visits or with lots of dropouts. Because Mahalanobis distance cannot be calculated with missing data present, there is an option to delete variables with at least X % of missing data based on the selected population and filters (default of 5%). Of remaining variables, scores are computed for those subjects with complete data. The general strategy of this process is to use as many variables as possible, while letting a few early dropouts fall out of the analysis.

What do I need?

This process requires the following variables:

•

DM ( ARM , SITEID , COUNTRY , USUBJID ). ( AGE and SEX are used if available.)

•

Findings domains require VISITNUM and xxSTRESN . ( xxTPTNUM is used if available.)

•

From Events or interventions domains, xxDECOD is required.

Domains that fail to meet the aforementioned criteria are not used.

Refer to Localization-Specific Value Specification for more information.

Output/Results

The output generated by this process is summarized in a tabbed report. Refer to the Multivariate Inliers and Outliers output documentation for detailed descriptions and guides to interpreting your results.