The Data Processing red triangle menu in the Functional Data Explorer platform contains the following options:
Cleanup
A submenu of the following data cleanup options:
Remove Zeros
Removes observations with zero values. If there are no zeros in the data, an alert appears, indicating that no zero values were found.
Remove Value
Displays a specifications window that enables you to specify a value to remove from the data.
Remove Selected
Removes observations that correspond to rows that are selected in the data table.
Remove Unselected
Removes observations that correspond to rows that are not selected in the data table.
Caution: Remove Selected and Remove Unselected remove the row numbers. When Auto Recalc is enabled, you must add or delete rows before using these options.
Filter X
Removes X values that fall outside of a specified interval. When you select the Filter X option, you must specify Below and Above values. The X values that fall outside of the specified interval are not used for the analysis.
Filter Y
Removes Y values that fall outside of a specified interval. When you select the Filter Y option, you must specify Below and Above values. The Y values that fall outside of the specified interval are not used for the analysis.
Reduce
Reduces the data over the X values using one of the following techniques:
– Use the Grid tab to interpolate observations to a common grid of values. You can specify the grid size. By default, the grid size is the number of values in the longest function. This is also the maximum allowable grid size.
– Use the Bin tab to create a specified number of bins that are evenly spaced over the unique X values. For each function (or level of the ID, Function variable), the observations within a bin are averaged to produce a Y value for the corresponding bin level.
– Use the Thin tab to remove every N observation over the X values, where N is determined by the specified thinning rate. This is done for each function (or level of the ID, Function variable). By default, the thinning rate is 2, which removes half of the observations in each function.
Note: The Remove options exclude the specified observations from the analysis and modeling reports, but the observations remain unchanged in the data table.
Transform
A submenu of the following options to transform the data:
Center
Centers the output.
Standardize
Standardizes the output by centering and scaling the data to have mean 0 and variance 1.
Range 0 to 1
Scales the output to lie within the range of 0 and 1.
Square Root
Transforms the data by computing the square root of the output. The output values must be nonnegative.
Square
Transforms the data by computing the square of the output.
Log
Transforms the data by computing the natural logarithm of the output.
Exp
Transforms the data by computing the exponential function of the output.
Negation
Transforms the data by negating the output.
Logit
Transforms the data by computing the logit function of the output. The output values must be between 0 and 1.
Log X
Transforms the data by computing the natural logarithm of the input.
Align
A submenu of the following options to align the input data:
Row Alignment
Replaces the input values with the row number.
Align Maximum
Aligns the functions using the observed maximum output value for each ID level. The input value associated with the observed maximum output value is set to zero for each ID level and the other input values are shifted up or down based on the difference between the observed maximum and zero.
Align Minimum
Aligns the functions using the observed minimum output value for each ID level. The input value associated with the observed minimum output value is set to zero for each ID level and the other input values are shifted up or down based on the difference between the observed minimum and zero.
Align 0 to 1
Aligns the output functions such that the range of the input values is 0 to 1.
Tip: Align 0 to 1 is particularly useful when you fit a P-Spline model.
Align by Function
Aligns the output functions such that each function starts at the overall minimum of the input values and ends at the overall maximum of the input values.
Dynamic Time Warping
(Available only when there is more than one function.) Aligns the output functions using dynamic time warping (DTW). DTW is a function alignment technique that finds an optimal warping to align two or more functions together. When you select the DTW option, a Select Reference Function window appears. Use this to select the reference function. The reference function is the function that the remaining functions are aligned to.
Once you select a reference function and click OK, a warping function plot is shown along with a list for the remaining query functions. On the warping function plot, the reference function is on the y-axis and the selected query function is on the x-axis. Deviations from the red diagonal line (y = x) indicate that the inputs of the query function have been warped for better alignment.
Spectral
A submenu of the following options that are useful for spectral data:
SNV
Applies the Standard Normal Variate method to the data. This method standardizes the output by centering and scaling each individual function (level of the ID variable) to have a mean of 0 and a standard deviation of 1.
MSC
Applies the Multiplicative Scatter Correction to the data. A simple linear regression is fit for each individual function (level of the ID variable) where the response is output values for the function and the regressor is the output values for the mean function. The original output values, yit, are then replaced by new values, y*it, using the following equation:
where bi is the slope obtained from the simple linear regression for function i. For more information, see Geladi et al (1985).
Savitzky-Golay
Provides options to use the Savitzky-Golay method. See Savitzky and Golay (1964).
Note: All options involving the Savitzky-Golay method require that the input data be on an evenly spaced grid and that at least one function contains 7 or more data points. If the data is not on an evenly spaced grid, it is automatically placed on an evenly spaced grid when you select a Savitzky-Golay option.
Filter
Applies a Savitzky-Golay filter to the data. This method fits local polynomials to several collections of points across the domain. The polynomials are fit using least squares and the number of points in each fit is determined by the bandwidth. When you select this option, several fits are made for polynomials of order 0, 1, and 2 and bandwidths up to 10. The best fitting models for each function are selected based on the AIC. The order of the polynomial and the bandwidth can be different for each function.
First Derivative
Applies a Savitzky-Golay filter to the data using only polynomials of order 2 or 3 and then takes the first derivative. Since the filter fits polynomials, the derivatives are computed analytically.
Second Derivative
Applies a Savitzky-Golay filter to the data using only polynomials of order 3 and then takes the second derivative. Since the filter fits polynomials, the derivatives are computed analytically.
Baseline Correction
Subtracts a baseline function from each individual function. A baseline correction is used when there is a known trend, or baseline, that you want to remove. For example, this could be due to an artifact of how the data is measured. Usually, the information is in the peaks of the data, so these regions are not included in the baseline model.
When you select this option, a baseline correction window is shown. This window contains a selection plot that displays the data and a set of options to specify the baseline model. The baseline correction window contains the following options:
Baseline Model
Specifies the type of model for the baseline function. You can specify a linear, quadratic, cubic, two parameter exponential, or three parameter exponential model.
Correction Region
Specifies the region that the baseline function is subtracted from. You can subtract the baseline from the entire function region or from only the regions that were used to construct the baseline model.
Baseline Regions
Adds or removes a pair of blue vertical lines to the selection plot. The lines are initially on top of one another. Move the lines to specify regions of the data that you do not want to include in the baseline model. The region of the data that falls in between a pair of blue lines is not included in the baseline model.
Anchor Points
Adds or removes a red vertical line to the selection plot. This line specifies data points that are forced into the baseline model.
Target Functions
(Available only when there is more than one function.) Enables you to load a target function.
Load Targets
Shows a window that enables you to specify a target function. A target function is used for curve matching, where it is desirable for all of the functions to look like the target function, also known as a reference function or a golden curve.
If you specify a target function, the data from the function is not used in model fitting. When you specify a target function, there are additional options added to the FPC Profiler. See FPC Profiler.
Note: A target function must be loaded before any other preprocessing steps are performed.
Plot Warping Functions
Shows or hides the warping function plot. On by default.
Save Distance Matrix
Saves the distance matrix to a separate data table. The distance matrix can be useful for clustering the functions. The distance matrix data table contains a hierarchical clustering script.
Save Warping Functions
Saves the warping functions to a separate data table. Each row of the data table contains the DTW adjusted input variable, the original input variable, and the ID variable.