This section describes the various types of models that can be fit in the Functional Data Explorer platform.
Basis function models expand the functional model and rewrite it as a linear combination of basis functions. In the Functional Data Explorer platform, you can fit a basis spline (B-Spline) model, a penalized basis spline (P-Spline) model, a Fourier Basis model, or a Wavelet model to the data. A Fourier Basis model is useful for periodic data. A periodic model assumes that the function finishes where it starts. See Fourier Basis Model.
A Wavelet model is a type of basis function model that is useful for data that contain a lot of peaks. Wavelet models require data to be on an evenly spaced grid. There are several families of wavelet models that are fit simultaneously, including Haar, Daubechies, Symlet, Coiflet, and Biorthogonal. These are all flexible functions with different shapes and types of peaks based on the parameters. For more information about wavelets, see Nason (2008).
Direct models perform functional principal components analysis directly on the data, without fitting a basis function model first. The data are converted into a stacked matrix. Each row of the matrix corresponds to the full output function for one level of the ID variable and each column of the matrix corresponds to a level of the input variable. Direct models obtain functional principal component analysis (FPCA) results by performing some type of matrix decomposition routine on the stacked matrix of functions. The type of matrix decomposition is determined by the model. Direct models are more flexible than basis function models and have reduced computation time, particularly for large data sets.
All of the direct methods in the Functional Data Explorer platform require that the input data be on an evenly spaced grid. If this is not the case, the first step of each method is to align the input data to be between 0 and 1 and then interpolate the observations to a common grid of input values.
The Functional Data Explorer platforms fits the following direct models:
The Direction Functional PCA method performs a singular value decomposition (SVD) on the stacked matrix of functions. The loadings of the SVD correspond to the shape functions. The singular values of the SVD correspond to the eigenvalues. The implementation of the Direct Functional PCA method is as follows:
1. Perform a singular value decomposition (SVD) on the matrix of stacked functions.
2. Smooth the first eigenfunction using a P-Spline model with a knot at each grid point.
3. Remove the first smoothed eigenfunction from the data and repeat step 1 to step 3 until a large amount of the variation in the data is explained.
The Penalized SVD method performs a penalized singular value decomposition (SVD) on the stacked matrix of functions. A penalized SVD places penalty parameters on the decomposition that zero out small values of the shape functions and scores. This method can reduce contribution of noise to the model and increase interpretability. See Penalized SVD.
The Nonnegative SVD method performs a nonnegative singular value decomposition (SVD) on the stacked matrix of functions. A nonnegative SVD constrains the matrix decomposition so that the scores and loadings are greater than or equal to zero. This ensures that the shape functions are nonnegative. This method is useful if you have functions that are strictly positive. See Nonnegative SVD.
The Penalized Nonnegative SVD method performs a penalized nonnegative singular value decomposition (SVD) on the stacked matrix of functions. A penalized nonnegative SVD combines the Penalized SVD and Nonnegative SVD methods to produce loadings and scores that are strictly nonnegative, but also zeroed out for small values. This method uses an adaptation of the algorithm in Lee et al (2010) to perform a penalized nonnegative SVD for all of the dimensions at once.
The Multivariate Curve Resolution method performs a matrix decomposition on the stacked matrix of functions. This method decomposes the matrix into a matrix of mixing proportions and a matrix of nonnegative shape functions. This decomposition creates a mixture of shape functions for each individual function (level of the ID variable). This method is useful if you know that your functions are a combination of a specific number of components. This method is popular for analyzing spectral data in the chemistry field. See Multivariate Curve Resolution.
Performs the same type of matrix decomposition as the Multivariate Curve Resolution method, except the shape functions can be negative.
Peak finding methods identify and summarize peaks in the data. This is useful for data such as chromatography data where data peaks are a characteristic of interest. The Automatic Peak Detection method uses continuous wavelet transformation (CWT) to find peaks automatically across all functions. See Du et al (2006). This method finds the peak maximum, the peak half-widths, and the upper and lower limits of the ranges of the individual peaks.