Example of Partial Least Squares

Use the Partial Least Squares platform to build a model for predicting the amounts of three different pollution compounds that are present in samples of sea water from the Baltic Sea.The three compounds of interest are:

• lignin sulfonate, which is pulp industry pollution

• humic acid, which is a natural forest product

• an optical whitener from detergent

The predictors are spectral emission intensities measured at a range of wavelengths (v1–v27). This example is from spectrometric calibration, which is an area where partial least squares is very effective.

For the purposes of calibrating the model, samples with known compositions are used. The calibration data consist of 16 samples of known concentrations of lignin sulfonate, humic acid, and detergent. Emission intensities are recorded at 27 equidistant wavelengths.

1. Select Help > Sample Data Folder and open Baltic.jmp.

Note: The data in the Baltic.jmp data table are reported in Umetrics (1995). The original source is Lindberg, Persson, and Wold (1983).

2. Select Analyze > Multivariate Methods > Partial Least Squares.

3. Assign ls, ha, and dt to the Y, Response role.

4. Assign Intensities, which contains the 27 intensity variables v1 through v27, to the X, Factor role.

5. Click OK.

The Partial Least Squares Model Launch control panel appears.

6. Select Leave-One-Out as the Validation Method.

7. Click Go.

Since the van der Voet test is a randomization test, your Prob > van der Voet T2 values might differ slightly.

Figure 6.2 Partial Least Squares Report

Partial Least Squares Report

The Root Mean PRESS (predicted residual sum of squares) Plot shows that Root Mean PRESS is minimized when the number of factors is 7. This is stated in the note beneath the Root Mean PRESS Plot. A report called NIPALS Fit with 7 Factors Using Fast SVD is produced. A portion of that report is shown in Figure 6.3.

The van der Voet T2 statistic tests to determine whether a model with a different number of factors differs significantly from the model with the minimum PRESS value. A common practice is to extract the smallest number of factors for which the van der Voet significance level exceeds 0.10 (SAS Institute Inc 2024f; Tobias 1995). If you were to apply this thinking here, you would fit a new model by entering 6 as the Number of Factors in the Model Launch panel.

Figure 6.3 Seven Extracted Factors

Seven Extracted Factors

8. Click the NIPALS Fit with 7 Factors Using Fast SVD red triangle and select Diagnostics Plots.

This gives a report showing actual by predicted plots and three reports showing various residual plots. The Actual by Predicted Plot shows the degree to which predicted compound amounts agree with actual amounts.

Figure 6.4 Diagnostics Plots

Diagnostics Plots

9. Click the NIPALS Fit with 7 Factors Using Fast SVD red triangle and select VIP vs Coefficients Plot.

Figure 6.5 VIP vs Coefficients Plot

VIP vs Coefficients Plot

The VIP vs Coefficients plot helps identify variables that are influential relative to the fit for the various responses. For example, v23, v2, and v26 all have variable importance for projection (VIP) values that exceed 0.8 and relatively large coefficients.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).