Statistics, Predictive Modeling and Data Mining with JMP®

Statistics is the discipline of collecting, describing and analyzing data to quantify variation and uncover useful relationships. It allows you to solve problems, reveal opportunities and make informed decisions in the face of uncertainty. Through the effective application of statistics, you can gain insight, foresight and the means for continuous learning and improvement, no matter what context you work within.

Whether your goal is description, explanation or prediction, you will appreciate the statistical discovery paradigm of JMP, which exploits the intrinsic synergy between visualization and modeling. No matter what the shape and size of your data, so long as it fits in memory, JMP will allow you to get the most from it, whatever your current level of statistical expertise.

JMP provides comprehensive facilities for univariate linear and nonlinear regression, the more useful multivariate approaches for exploration, dimension reduction and modeling, and for the analysis of time series and categorical data. JMP and JMP Pro are intended to meet the statistical needs of most users most of the time, surfacing the various techniques and results in a way that you can easily grasp, but without compromising the power of the underlying algorithms. JMP Pro, the advanced analytics version of JMP, includes a rich set of sophisticated algorithms for building better models with your data. It lets you hold back data consistently to help build models whose predictions will generalize well.

With JMP, not only can you get your modeling done quickly and correctly, you can easily compare and contrast models built using different approaches, averaging results and generating score code for new cases. Along the way you can identify key variables and optimize outcomes with or without noise in the inputs.

Through visual and interactive reports and profilers, JMP helps you communicate simple or complex findings to those who may not have an affinity with statistical methods, yet who need to understand and act upon your findings.

Regression

Regressions in JMP

Look for associations between cholesterol loss (continuous) and both gender (nominal) and age (ordinal) with the Fit Y by X platform

The class of linear regression models is diverse and ubiquitous. JMP puts these powerful methods in the hands of practitioners of all skill levels, and in a form you can easily use.

Using Fit Y by X, you can test for and model dependencies between a single input and outcome. JMP unifies what is normally considered a disparate set of statistical approaches into a coherent, understandable whole and provides graphical output so you can understand results easily.

The Fit Model platform provides an environment for fitting simple or complex models with specified fixed and random effects and defined error terms.

Whatever your favored model-building approach, JMP provides a complete set of manual and automated methods, with appropriate diagnostics, to allow you to rapidly build most types of linear models. Specific fitting options focus your attention appropriately; JMP Pro extends the repertoire by adding Mixed Models (to correctly handle repeated measurements) and Generalized Regression (with regularized or penalized regression techniques like the Elastic Net that help identify X's that may have explanatory power). An “informative missing” approach allows the information in all your rows to contribute.

JMP lets you easily compare competing models. Multiple responses are handled in an integrated way, and the Profiler makes it simple to compare and contrast the interpretability and results of various fits. The Profiler also allows you to find settings to optimize your Y's, and Monte Carlo simulations help you assess how variation in the X's will be transmitted into the Y's.

The Nonlinear platform allows you to model nonlinear relationships. Nonlinear models use either standard least squares or a custom loss function. JMP provides a library of nonlinear model types needed for bioassay and pharmacokinetic studies, and does not require you to input starting values or auxiliary formulae. Grouping variables are supported, and you can quickly and easily isolate any subject effects using graphical displays. The custom loss function facility provides additional flexibility, allowing you to use, for example, iteratively reweighted least squares for robust regression.

Back to Top

Categorical Data

The Categorical platform in JMP provides tables, summaries and statistical tests of response data and multiple response data when the measured responses indicate membership of a particular category. Such data is generated in a variety of settings, including test results, classifying defects or side effects, and administering surveys.

Partly because of its diverse application, categorical data can be presented in a variety of formats. A particular strength of the Categorical platform is that it can handle this diversity without any need to reshape the data prior to exploration and analysis. One or more columns can be used to define the categories within and between which variation in the response is assessed, and the Categorical report contains the resulting charts of share and frequency, by category. Used in conjunction with the data filter in JMP, these charts provide quick and easy review of large-scale survey data. The report can also display the associated tabulations and cross tabulations, which can be quickly transposed for easier viewing or printing if needed.

Depending on the nature of the responses, you can also statistically address questions like:

  • Does the pattern of response vary with sample categories, and have they changed over time?
  • For each response category, are the rates the same across sample categories?
  • How closely do the raters agree?
  • What is the relative risk of different treatments?
Definitive Screening Designs in JMP

Use the Categorical platform to review adverse events and assess relative risk in a clinical trial.

Back to Top

Trees

JMP Decision Tree

Interactively build a simple decision tree with training and validation data.

The Partition platform in JMP enables you to find cuts or groupings within your inputs (X's) that can best predict the variation in an output (Y). X's and Y can both be either categorical or continuous. The process of splitting the data by finding an appropriate X and an appropriate grouping or cut-point for this X is recursive – you can continue it until you get a useful fit. The result is naturally represented as a tree, and you can also get important information about which X's contribute most to explaining the variation in Y.

Trees are robust to the presence of missing values, and accommodate any joint effects of X's directly. You can grow your tree using decision trees, bootstrap forests (JMP Pro only) or boosted trees (JMP Pro only). Note that simple decision trees are not likely to generalize well to new data, so if you need predictive power you should investigate JMP Pro.

Back to Top

Neural Networks

The Neural platform in JMP enables you to build fully connected neural networks with hidden nodes in one or two layers. Each node can have one of three different activation functions, and you can have any number of nodes in each layer.

JMP Pro allows you to automatically handle missing data, transform X's within the platform, and use boosting to help your network to learn difficult cases by applying one of four penalty methods.

Neural Networks in JMP Pro

Compare the effect of different neural architectures on the decision boundary from boosted neural fits.

Back to Top

Multivariate Interdependence Techniques

Multivariate Interdependence Techniques

Use a Parallel Plot, PCA and Nonparametric Scatterplot Matrix to study the evolution in time of a complex industrial process.

Multivariate analyses can focus either on observations (rows) or on variables (columns), and may treat variables on an equal footing (interdependence techniques) or distinguish between effects, X's, and responses, Y's, (dependence techniques). But whatever your analytical objective, JMP will work with you to get the job done. (See Multivariate Dependence Techniques section for multivariate methods involving X's and Y's).

In the multivariate context, it is vital to consider data quality, the identification and treatment of outliers, and the pattern of missing values. Typically, these issues need to be addressed iteratively as the analysis unfolds, and the interactivity of JMP is built for this way of working. For dependence techniques, JMP provides principal components analysis (PCA), factor analysis, clustering, normal mixtures and self-organizing maps. Each uses the software's unfolding style of analysis, so that you can shape your approach according to what the data reveals to you.

The Multivariate platform is often the entry point into any analysis with many columns. It allows you to quickly assess the associations and parametric and nonparametric correlations between all pairs of numeric variables, identify outliers and impute missing values.

PCA lets you reduce the dimensionality of your description when correlations are present, and factor analysis lets you model variability among observed variables in terms of a smaller number of unobserved factors. The Factor Analysis platform allows multiple fits and rotations in one report, and conditional formatting allows you to suppress small values.

Clustering, a key technique in unsupervised learning, forms subgroups so that cases in a particular subgroup are more alike than those in another subgroup. The Cluster platform in JMP lets you scale and transform variables before analysis, provides various distance measures, and includes hierarchical and K-means clustering. Hierarchical clustering produces a dendrogram you can manipulate interactively to decide on the most useful number of clusters.

Back to Top

Multivariate Dependence Techniques

For dependence multivariate techniques, JMP provides partial least squares regression (PLS) and discriminant analysis.

PLS is a versatile technique that can consume data of any shape, and with any number of X’s and Y’s. It is often applied in situations where linear regression is not viable because there are more X's than rows, but it can also be seen as a technique useful within predictive modeling generally. The PLS platform in JMP provides basic capabilities, but with JMP Pro there is also a PLS personality in the Fit Model platform that allows you to fit more complex models involving powers and interaction terms. With JMP Pro you can also impute missing values, and build PLS models using a choice of validation methods. JMP provides both the NIPALS and SIMPLS algorithms for fitting, and automated ways to find the most appropriate number of latent factors to include in the model. It provides all the usual diagnostics so you can check model adequacy. You can also quickly generate pruned PLS models with a reduced number of terms simply by making appropriate selections in graphical output.

The Discriminant platform allows you to understand which combination of X's help to explain category membership of a Y. It provides linear, quadratic or regularized methods for discrimination, stepwise selection of X's if needed, and allows you to easily inspect uncertain or misclassified rows to decide what follow-up or remedial action is required.

JMP PLS Model

This PLS model predicts water quality in the Blue Ridge ecoregion of the Savannah River basin.

Back to Top

Time Series

Time Series

Use time series analysis to automatically fit a set of ARIMA and smoothing models and make a forecast from the best one.

The Time Series platform in JMP allows you to explore, model and forecast univariate time series. Your statistical modeling approach can be informed by the usual diagnostics, including plots of autocorrelations and partial autocorrelations, variograms, AR coefficients and spectral density plots.

You can build several ARIMA models for a time series with a range of parameters with a single click, and select the best model using various figures of merit such as AIC, SBC, MAPE and MAE. You can build transfer models to model an output time series in terms of one or more input series, applying pre-whitening to the inputs if required. You can also generate the equivalent PROC ARIMA code to run your model in SAS if needed.

The Time Series platform also contains a number of smoothing techniques for time series, including Holt exponential smoothing, seasonal exponential smoothing, and Winter’s method.

In all cases you can produce interactive forecasts of the predicted future behavior, with confidence intervals.

Back to Top

More Resources for Statistics, Predictive Modeling and Data Mining

Back to Top