In lieu of an abstract, here is a brief excerpt of the content:

Appendix C Advanced Calculus Background C.1 Convergence rates Definition 5. Let x∗ , xk ∈ R for k = 0, 1, . . . The sequence {xk} is said to converge to x∗ if lim k→∞|xk − x∗| = 0. The convergence is linear if ∃ c ∈ [0, 1) and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ c |xk − x∗| ; superlinear if ∃ ck −→ 0 and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ ck |xk − x∗| ; quadratic if ∃ c ∈ [0, 1) and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ c |xk − x∗| 2 . Definition 6. A locally convergent iterative algorithm converges to the correct answer if the iteration starts close enough. A globally convergent iterative algorithm converges when starting from almost any point. For minimization, this is not to be confused with finding the global minimum of a functional on a compact domain (see below). Global and local minimum: x∗ is a global minimizer of a function f : Rn −→ R on a compact domain D if f(x∗ ) ≤ f(x), ∀x ∈ Rn . x∗ is a local minimizer inside a certain region, usually defined as an open “ball” of size δ around x∗ , if f(x∗ ) ≤ f(x) for x − x∗ 2 < δ. 271 272 APPENDIX C C.2 Multivariable calculus The gradient and Hessian of a scalar function of several variables f(x) are a vector and a matrix, respectively, defined by ∇f(x) ≡ (∂f/∂x1, ..., ∂f/∂xn)T , ∇2 f(x) ≡ . ∂2 f ∂xi∂xj / . For a vector function of several variables r(x) = (r1(x), r2(x), . . . , rm(x))T , with each rk(x) : Rn → R, we denote by J(x) the Jacobian of r(x) and by Gk the Hessian of a component function rk: J(x) = . ∂ri ∂xj / , Gk(x) = . ∂2 rk ∂xi∂xj / . Definition 7. Descent direction p of a function f : Rn −→ R. p is a descent direction at xc for a function f(x) if for sufficiently small and positive α f(xc+αp < f(xc). Alternatively, p is a descent direction at xc if the directional derivative (projection of the gradient on a given direction) of the function f(x) at xc in the direction p is negative: f(xc)T p < 0. Theorem 8. Taylor’s theorem for a scalar function. If f : Rn −→ R is continuously differentiable, then, for some t ∈ [0, 1] , f(x + p) = f(x) + ∇f(x + tp)T p. If f(x) is twice continuously differentiable then, for some t ∈ [0, 1], f(x + p) = f(x) + ∇f(x)T p + pT ∇2 f(x + tp)p. A necessary condition for x∗ to be a stationary point of f(x) is that ∇f(x∗ ) = 0. The sufficient conditions for x∗ to be a local minimizer are ∇f(x∗ ) = 0 and ∇2 (x∗ ) is positive definite. The derivative DA(x) of an m × n nonlinear matrix function A(x), where x ∈ Rk is a tri-dimensional tensor formed with k matrices (slabs) of dimension m × n, each one containing the partial derivatives of the elements of A with respect to one of the variables of the vector x. Thus, the second derivative of the vector function r(x) is the tri-dimensional tensor G = [G1, ...,Gm]. The derivative of the orthogonal projector PA(x) onto the column space of a differentiable m × n matrix function A(x) of local constant rank can be obtained as follows. [3.138.138.144] Project MUSE (2024-04-16 12:47 GMT) ADVANCED CALCULUS BACKGROUND 273 Lemma 9. (Lemma 4.1 in [101]) Let A† (x) be the pseudoinverse of A(x). Then PA(x) = AA† and DPA(x) = P⊥ A(x)DAA† + (P⊥ A(x)DAA† )T , where P⊥ A = I − PA(x). Definition 10. A function f is Lipschitz continuous with constant γ in a set X ⊆ R if ∀x, y ∈ X, |f(x) − f(y)| ≤ γ |x − y| . (C.2.1) Lipschitz continuity is an intermediary concept between continuity and differentiability . The operator V : Rn → Rn , is Lipschitz continuous with constant γ in a set X ⊆ Rn if ∀x, y ∈ X, V (x) − V (y)2 ≤ γ x − y2 . V is contracting if it is Lipschitz continuous with constant less than unity: γ < 1. V is uniformly monotone if there exists a positive m such that m x − y 2 2 ≤ (V (x) − V (y))T (x − y). V is vector Lipschitz if|V (x) − V (y)| ≤ A |x − y| , where A is a non-negative n×n matrix and the inequality is meant elementwise . V is vector contracting if it is vector Lipschitz continuous with A < 1. Lagrange multipliers Lagrange multipliers are used to find the local extrema of a function f(x) of n variables subject to k equality constraints gi(x) = ci, by reducing the problem to an (n−k)-variable problem without constraints. Formally, new scalar variables λ = (λ1, λ2, . . . , λk)T , one for each contraint, are introduced and a new function is defined as F(x, λ) = f(x) + k  i=1 λi(gi(x) − ci). The local extrema of this extended function F (the Lagrangian), occur at the points where its gradient is zero: ∇F(x, λ) = 0, λ = 0, or equivalently, ∇xF(x, λ) = 0, ∇λF(x, λ) = 0. 274 APPENDIX C This form encodes compactly the constraints, because ∇λi F(x, λ) = 0 ⇔ gi(x) = ci. An alternative to this method is to use the equality constraints to eliminate k of the original variables. If the problem is large and sparse, this elimination may destroy the sparsity and thus the Lagrange multipliers approach is preferable. ...

Share