This section contains the reports that are available when you fit a basis function expansion model in the Functional Data Explorer platform.
The Model Controls report enables you to define parameters of models to compare in the Model Selection report. The appearance of the Model Controls report depends on the type of model that is fit.
Note: Model Controls are not available for Wavelet models.
When a B-Spline or P-Spline model is fit, you can specify the following parameters:
Number of Knots
Add, remove, or specify a range for the number of knots in each spline. The knots must be non-zero integers.
Note: The maximum number of knots allowed for B-Spline models is one less than the maximum number of observations per function or the number of unique inputs. The maximum number of knots allowed for P-Spline models is two less than the number of unique inputs. If you specify a number larger than the maximum, a warning message appears.
Spline Degree
Add or remove spline degree fits from the Model Selection report.
When a Fourier Basis model is fit, you can specify the following parameters:
Number of Fourier Pairs
Add, remove, or specify a range for the number of Fourier pairs to compare.
Period
Change the period of the function.
After you specify the model controls, click Go to view the updated models in the Model Selection report.
Tip: To specify the Model Controls prior to fitting a model, click the Functional Data Explorer red triangle and select Models > Model Controls. Then, select the desired model.
The Model Selection report contains an overall prediction plot, a grid of individual prediction plots, a solution path plot, and a table. The grid of individual prediction plots has the same layout and controls as the grid of individual plots in the Data Processing report. At most, there are twenty plots shown at a time. There are drop-down menus and arrows that enable you to view different groups of individual prediction plots.
The prediction plots show the raw data and prediction curves for the current model. If there is a validation set, the predicted curves are not shown for functions that are in the validation set. The curve in the overall prediction plot is a prediction of the mean curve. The curves in the individual prediction plots are prediction curves for each specific function. For B-Spline models, the overall prediction plot also displays the location of the knots. You can change the location of the knots by dragging the blue slider bars to different locations. To update the model reports according to the new knot locations, click the Update Models button. To reset the knots to their default locations, click the Reset Knots button.
For Wavelets models, there is also a Coefficients tab. This tab contains an overall coefficients plot and a grid of individual coefficients plots. The coefficients plots consist of dashed lines that represent the relative importance of the coefficients within each resolution. The placement of the lines on the plot is determined by the input variable (horizontal axis) and the resolution number (vertical axis). These numbers are located in the column names of the Wavelets Coefficients table. For the individual plots, the color of each line on the plot is determined by the sign of the corresponding coefficient. Blue indicates positive coefficients and red indicates negative coefficients. The length of each line in the plot is determined by the coefficients. The length is the corresponding coefficient scaled by the largest absolute value coefficient within each resolution. Therefore, the largest coefficient has the longest line length and the lines get shorter as the coefficients get smaller. For the mean plot, the coefficients are averaged over each function. The color and length of each line is determined by the averaged and scaled coefficient.
Note: The Coefficients plots are available only for the best fitting wavelets model.
The appearance of the solution path plot and associated table depend on the model type.
The solution path plot shows a model selection criterion plotted over the defined number of knots. There is a separate solution path for each spline degree. The Bayesian Information Criterion (BIC) is the default fitting criterion. Use the model red triangle option Model Selection to change the selection criterion. The current solution is designated by the dotted vertical line on the solution path plot. By default, the spline degree and number of knots selected corresponds to the model that has the smallest model selection criterion value. To change the current model selection, drag the slider at the top of a vertical line or click a specific spline in the solution path plot or legend. These actions automatically update the prediction plots in the Model Selection report, as well as the information in all other reports.
The table below the solution path plot is the Fit Statistics table, which contains information about the current solution model. It shows the number of knots, the spline degree, the -2 Log Likelihood, the values for the AICc, BIC, and GCV model fitting criteria, and a value for the response standard deviation. The response standard deviation is defined as the residual sigma from the fitted model. When a P-Spline model is selected, the penalty parameter λ (Lambda) is also displayed.
The solution path plot shows a model selection criterion plotted across the number of Fourier pairs for a defined period. The Bayesian Information Criterion (BIC) is the default fitting criterion. Use the model red triangle option Model Selection to change the selection criterion. The current solution is designated by the dotted vertical line in the solution path plot. By default, the slider is placed at the number of Fourier pairs that correspond to the model that has the smallest model selection criterion value. Drag the slider at the top of the dotted vertical line to change the number of Fourier pairs in the current model. Dragging the slider automatically updates the prediction plots in the Model Selection report, as well as the information in all other reports.
The table below the solution path plot is the Fit Statistics table, which contains information about the current solution model. It shows the number of Fourier pairs, the -2 Log Likelihood, the values for the AICc, BIC, and GCV model fitting criteria, and a value for the response standard deviation. The response standard deviation is defined as the residual sigma from the fitted model.
The solution path plot shows a model selection criterion plotted across the model number that defines the wavelets model. The Bayesian Information Criterion (BIC) is the default fitting criterion. Use the model red triangle option Model Selection to change the selection criterion. The current model selection is designated by the dotted vertical line on the solution path plot. By default, the selected model is the model that has the smallest model selection criterion value.
The table below the solution path plot shows the model numbers, corresponding model names, and current model selection. This table is sorted by the model selection criterion, with the best fitting model at the top. Select different wavelets models by dragging the slider at the top of the dotted vertical line or by selecting a model directly in the table. Selecting a different model automatically updates the prediction plots in the Model Selection report, as well as the information in all other report.
The Diagnostic Plots report contains the Actual by Predicted plot and the Residual by Predicted plot. These plots help assess how well the current model fits the data. The Diagnostic Plots report is closed by default.
Shows summaries from the Functional PCA for each level of the ID variable. The functional principal components associated with eigenvalues that explain more than 1% variation in the data are displayed by default. The mean, standard deviation, median, minimum, maximum, integrated difference, root integrated square error (RISE), and root integrated function square (RIFS) are also shown. The integrated difference and RISE summary values are used to determine how much the ID specific function differs from the overall mean function. The RIFS summary value is used for optimal curve fitting. See Statistical Details for the Function Summaries Report.
Note: The integrated difference, RISE, and RIFS statistics are not available for Wavelet models.
Shows a table of the estimated basis function coefficients and their standard deviations. These coefficients are common across all levels of the ID variable and are fixed estimates in the mixed model framework. To view standard errors and confidence intervals for the coefficients, right-click in the table and select Columns.
For Wavelet models, the estimated wavelet coefficients are shown for each level of the ID variable. There is a column for the father wavelet and then a series of columns for the remainder of the coefficients. Each coefficient column is identified by the corresponding resolution and location on the input domain. The coefficients table contains a sparse representation of the wavelet coefficients that is found using thresholding (Donoho, 1995).
Shows a table of the estimated random coefficients for each basis function and functional process combination. These are unique to each level of the ID variable and are random effects estimates in the mixed model framework.
Note: The Random Coefficients by Function report is not available for Wavelet models.
Functional principal components analysis (functional PCA) is performed on the fitted functional model. The Functional PCA report lists the eigenvalues that correspond to each functional principal component (FPC) in order from largest to smallest. The percent of variation accounted for by each FPC and the cumulative percent is listed and shown in a bar chart. There is a graph of the mean function as well as a graph for each shape function. These are the values of the eigenfunctions.
You can perform model selection in the Functional PCA report to refine the selected number of functional principal components. There is a solution path plot that shows the Bayesian Information Criterion (BIC) plotted versus the number of FPCs. The current number of FPCs is designated by the dotted vertical line in the solution path plot. It is possible that models with different numbers of FPCs might have similar fits. Therefore, the solution path plot provides zones, which are intervals of values of the BIC statistic. There is a green zone and a yellow zone. The green zone contains values in the interval of the minimum BIC to the minimum BIC plus four. The yellow zone contains values in the interval of the minimum BIC plus four to the minimum BIC plus 10. By default, the model with the smallest number of FPCs within the green zone is selected. You can drag the slider at the top of the vertical line to change the number of FPCs. Dragging the slider automatically updates the other information in the Functional PCA report.
Note: Narrow zones relative to the full y-axis scale can be difficult to view on your plot. Zoom in on the y-axis to better visualize the zones.