Suppose that f(β) is the model. Then the Nonlinear platform attempts to minimize the sum of the loss functions defined as follows:
The second term is probably small if ρ is the squared residual because the sum of residuals is small. The term is zero if there is an intercept term. For least squares, this is the term that distinguishes Gauss-Newton from Newton-Raphson.