In lieu of an abstract, here is a brief excerpt of the content:

Chapter 12 Nonlinear Least Squares Applications In this chapter we consider several nonlinear least squares applications in detail, starting with the fast training of neural networks and their use in optimal design to generate surrogates of very expensive functionals. Then we consider several inverse problems related to the design of piezoelectrical transducers, NMR and geophysical tomography. Not surprisingly, several of the applications lead to separable problems that are solved using variable projection. 12.1 Neural networks training Neural networks are nonlinear parametric models. Training the network corresponds to fitting the parameters in these models in the least squares sense by using pre-classified data (a training set) and an optimization algorithm . Since the nonlinear least squares problem (NLLSQ) that results is one whose linear and nonlinear variables separate, the use of the variable projection (VP) algorithm is possible, and it increases both the speed and the robustness of the training process [101, 103, 168, 182, 199, 227, 252, 253, 254], compared to traditional methods. First we explain the necessary neural network concepts and notations that lead to the specific least squares problem that has to be solved to train the network. Then we explain and test a training algorithm based on variable projection. The algorithm is applicable to one hidden layer, fully connected, neural network models using several types of activation functions. A generalization of VP developed and implemented by Golub and Leveque [98] (see also [84, 85, 141]) can be used for the case of multiple outputs. 231 232 LEAST SQUARES DATA FITTING WITH APPLICATIONS Neural network concepts Neural networks are a convenient way to represent general nonlinear mappings between multidimensional spaces in terms of superposition of nonlinear functions, by using so-called activation functions an hidden units. We discuss here multilayer, feed-forward, fully connected neural networks (NNs), although the techniques are applicable to more general ones, for instance, those with feedback loops. The NN we consider consist of 3 types of nodes arranged in layers: input, hidden and output layers. Each node can have several inputs and outputs, and they act on the input information in ways that depend on the layer type: • Input node: no action. • Hidden node: – Weighted sum of its inputs with a possible offset (bias) that can be incorporated as an additional input. If wi are the weights and xi the inputs, with x0 corresponding to the bias, i.e., x0 = 1, then y = d  i=0 wixi ≡ wT x. – This linear output can be generalized by applying a nonlinear function f(·), called an activation function, so that the output is instead z = f(y) = f(wT x). Observe that this provides a standardized way to handle multivariable inputs. • Output node: it can weight and sum inputs and additionally apply an activation function. In a feed-forward NN there is information flowing in only one direction. A fully connected NN has every node in one layer connected to every node in the next layer, and there are no connections between nodes of the same layer. See Figure 12.1.1 for the general architecture of a NN. The notation used is as follows: • The training set is defined by (input/output) pairs, x1,i and ti, i = 1, . . . , m. • At the lth-layer, xl,i and xl+1,i denote the input and output vectors, whereas yl,i is used for the vector of weighted sums. • The final output is xL+1,i. • Note that these vectors may have different lengths in different layers, given that the number of nodes can vary. [18.222.69.152] Project MUSE (2024-04-25 16:04 GMT) NONLINEAR LEAST SQUARES APPLICATIONS 233 Figure 12.1.1: Neural network architecture. • At the lth-layer, the weights, for calculating the weighted sums are denoted by the vector wl. The number of its elements is the product of the number of nodes in layer l − 1, times the number of nodes in layer l. • The total weight vector for the NN will be denoted by WT = ( w1 T w2 T ... wL T ). Perceptron models These are networks that use nonlinear activation functions, and they are usually applied to classification problems. To allow for a general mapping, one must consider successive transformations corresponding to several layers of adaptive parameters. The most common choice for the activation function is the logistic sigmoid function, see Figure 12.1.2, defined by f(y) = 1 1 + e−y . (12.1.1) Some of its properties...

Share