The Data Processing red triangle menu contains the following options:
Cleanup
A submenu of the following data cleanup options:
Remove Zeros
Removes observations with zero values. If there are no zeros in the data, an alert appears, indicating that no zero values were found.
Remove Value
Displays a specifications window that enables you to specify a value to remove from the data.
Remove Selected
Removes observations that correspond to rows that are selected in the data table.
Remove Unselected
Removes observations that correspond to rows that are not selected in the data table.
Caution: Remove Selected and Remove Unselected remove the row numbers. When Auto Recalc is enabled, you must add or delete rows before using these options.
Filter X
Removes X values that fall outside of a specified interval. When you select the Filter X option, you must specify Below and Above values. The X values that fall outside of the specified interval are not used for the analysis.
Filter Y
Removes Y values that fall outside of a specified interval. When you select the Filter Y option, you must specify Below and Above values. The Y values that fall outside of the specified interval are not used for the analysis.
Reduce
Reduces the data over the X values using one of the following techniques:
– Use the Grid tab to interpolate observations to a common grid of values. You can specify the grid size. By default, the grid size is half the number of unique input values and therefore reduces the number of total observations. If you are not interested in reducing the number of total observations, but simply want your observations to be on the same grid, specify the grid size to be the number of unique input values.
– Use the Bin tab to create a specified number of bins that are evenly spaced over the unique X values. For each function (or level of the ID, Function variable), the observations within a bin are averaged to produce a Y value for the corresponding bin level.
– Use the Thin tab to remove every N observation over the X values, where N is determined by the specified thinning rate. This is done for each function (or level of the ID, Function variable). By default, the thinning rate is 2, which removes half of the observations in each function.
Note: The Remove options exclude the specified observations from the analysis and modeling reports, but the observations remain unchanged in the data table.
Transform
A submenu of the following options to transform the data:
Center
Centers the output.
Standardize
Standardizes the output by centering and scaling the data to have mean 0 and variance 1.
Range 0 to 1
Scales the output to lie within the range of 0 and 1.
Square Root
Transforms the data by computing the square root of the output. The output values must be nonnegative.
Square
Transforms the data by computing the square of the output.
Log
Transforms the data by computing the natural logarithm of the output.
Exp
Transforms the data by computing the exponential function of the output.
Negation
Transforms the data by negating the output.
Logit
Transforms the data by computing the logit function of the output. The output values must be between 0 and 1.
Log X
Transforms the data by computing the natural logarithm of the input.
Align
A submenu of the following options to align the input data:
Row Alignment
Replaces the input values with the row number.
Align Maximum
Aligns the functions using the observed maximum output value for each ID level. The input value associated with the observed maximum output value is set to zero for each ID level and the other input values are shifted up or down based on the difference between the observed maximum and zero.
Align Minimum
Aligns the functions using the observed minimum output value for each ID level. The input value associated with the observed minimum output value is set to zero for each ID level and the other input values are shifted up or down based on the difference between the observed minimum and zero.
Align 0 to 1
Aligns the output functions such that the range of the input values is 0 to 1.
Tip: Align 0 to 1 is particularly useful when you fit a P-Spline model.
Align by Function
Aligns the output functions such that each function starts at the overall minimum of the input values and ends at the overall maximum of the input values.
Dynamic Time Warping
(Available only when there is more than one function.) Aligns the output functions using dynamic time warping (DTW). DTW is a function alignment technique that finds an optimal warping to align two or more functions together. When you select the DTW option, a Select Reference Function window appears. Use this to select the reference function. The reference function is the function that the remaining functions are aligned to.
Once you select a reference function and click OK, a warping function plot is shown along with a list for the remaining query functions. On the warping function plot, the reference function is on the y-axis and the selected query function is on the x-axis. Deviations from the red diagonal line (y = x) indicate that the inputs of the query function have been warped for better alignment.
Target Functions
(Available only when there is more than one function.) A submenu that enables you to load target functions.
Load Targets
Shows a window that enables you to specify a target function. A target function is used for curve matching, where it is desirable for all of the functions to look like the target function. You can also specify two target functions to compare the remaining curves to the “best” and “worse” case functions.
If you specify one or more target functions, the data from the functions are not used in model fitting. For each specified target function, two rows are added to the FPC Profiler. See FPC Profiler.
Note: Target functions must be loaded before any other preprocessing steps are performed.
Plot Warping Functions
Shows or hides the warping function plot. On by default.
Save Distance Matrix
Saves the distance matrix to a separate data table. The distance matrix can be useful for clustering the functions. The distance matrix data table contains a hierarchical clustering script.
Save Warping Functions
Saves the warping functions to a separate data table. Each row of the data table contains the DTW adjusted input variable, the original input variable, and the ID variable.