Least Squares Data Fitting with Applications

Appendix C: Advanced Calculus Background
Johns Hopkins University Press
Chapter
- View Citation
- Related Content
Additional Information

In lieu of an abstract, here is a brief excerpt of the content:

Appendix C Advanced Calculus Background C.1 Convergence rates Deﬁnition 5. Let x∗ , xk ∈ R for k = 0, 1, . . . The sequence {xk} is said to converge to x∗ if lim k→∞|xk − x∗| = 0. The convergence is linear if ∃ c ∈ [0, 1) and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ c |xk − x∗| ; superlinear if ∃ ck −→ 0 and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ ck |xk − x∗| ; quadratic if ∃ c ∈ [0, 1) and an integer K > 0 such that for k ≥ K,|xk+1 − x∗| ≤ c |xk − x∗| 2 . Deﬁnition 6. A locally convergent iterative algorithm converges to the correct answer if the iteration starts close enough. A globally convergent iterative algorithm converges when starting from almost any point. For minimization, this is not to be confused with ﬁnding the global minimum of a functional on a compact domain (see below). Global and local minimum: x∗ is a global minimizer of a function f : Rn −→ R on a compact domain D if f(x∗ ) ≤ f(x), ∀x ∈ Rn . x∗ is a local minimizer inside a certain region, usually deﬁned as an open “ball” of size δ around x∗ , if f(x∗ ) ≤ f(x) for x − x∗ 2 < δ. 271 272 APPENDIX C C.2 Multivariable calculus The gradient and Hessian of a scalar function of several variables f(x) are a vector and a matrix, respectively, deﬁned by ∇f(x) ≡ (∂f/∂x1, ..., ∂f/∂xn)T , ∇2 f(x) ≡ . ∂2 f ∂xi∂xj / . For a vector function of several variables r(x) = (r1(x), r2(x), . . . , rm(x))T , with each rk(x) : Rn → R, we denote by J(x) the Jacobian of r(x) and by Gk the Hessian of a component function rk: J(x) = . ∂ri ∂xj / , Gk(x) = . ∂2 rk ∂xi∂xj / . Deﬁnition 7. Descent direction p of a function f : Rn −→ R. p is a descent direction at xc for a function f(x) if for suﬃciently small and positive α f(xc+αp < f(xc). Alternatively, p is a descent direction at xc if the directional derivative (projection of the gradient on a given direction) of the function f(x) at xc in the direction p is negative: f(xc)T p < 0. Theorem 8. Taylor’s theorem for a scalar function. If f : Rn −→ R is continuously diﬀerentiable, then, for some t ∈ [0, 1] , f(x + p) = f(x) + ∇f(x + tp)T p. If f(x) is twice continuously diﬀerentiable then, for some t ∈ [0, 1], f(x + p) = f(x) + ∇f(x)T p + pT ∇2 f(x + tp)p. A necessary condition for x∗ to be a stationary point of f(x) is that ∇f(x∗ ) = 0. The suﬃcient conditions for x∗ to be a local minimizer are ∇f(x∗ ) = 0 and ∇2 (x∗ ) is positive deﬁnite. The derivative DA(x) of an m × n nonlinear matrix function A(x), where x ∈ Rk is a tri-dimensional tensor formed with k matrices (slabs) of dimension m × n, each one containing the partial derivatives of the elements of A with respect to one of the variables of the vector x. Thus, the second derivative of the vector function r(x) is the tri-dimensional tensor G = [G1, ...,Gm]. The derivative of the orthogonal projector PA(x) onto the column space of a diﬀerentiable m × n matrix function A(x) of local constant rank can be obtained as follows. [3.145.191.22] Project MUSE (2024-04-17 04:36 GMT) ADVANCED CALCULUS BACKGROUND 273 Lemma 9. (Lemma 4.1 in [101]) Let A† (x) be the pseudoinverse of A(x). Then PA(x) = AA† and DPA(x) = P⊥ A(x)DAA† + (P⊥ A(x)DAA† )T , where P⊥ A = I − PA(x). Deﬁnition 10. A function f is Lipschitz continuous with constant γ in a set X ⊆ R if ∀x, y ∈ X, |f(x) − f(y)| ≤ γ |x − y| . (C.2.1) Lipschitz continuity is an intermediary concept between continuity and differentiability . The operator V : Rn → Rn , is Lipschitz continuous with constant γ in a set X ⊆ Rn if ∀x, y ∈ X, V (x) − V (y)2 ≤ γ x − y2 . V is contracting if it is Lipschitz continuous with constant less than unity: γ < 1. V is uniformly monotone if there exists a positive m such that m x − y 2 2 ≤ (V (x) − V (y))T (x − y). V is vector Lipschitz if|V (x) − V (y)| ≤ A |x − y| , where A is a non-negative n×n matrix and the inequality is meant elementwise . V is vector contracting if it is vector Lipschitz continuous with A < 1. Lagrange multipliers Lagrange multipliers are used to ﬁnd the local extrema of a function f(x) of n variables subject to k equality constraints gi(x) = ci, by reducing the problem to an (n−k)-variable problem without constraints. Formally, new scalar variables λ = (λ1, λ2, . . . , λk)T , one for each contraint, are introduced and a new function is deﬁned as F(x, λ) = f(x) + k i=1 λi(gi(x) − ci). The local extrema of this extended function F (the Lagrangian), occur at the points where its gradient is zero: ∇F(x, λ) = 0, λ = 0, or equivalently, ∇xF(x, λ) = 0, ∇λF(x, λ) = 0. 274 APPENDIX C This form encodes compactly the constraints, because ∇λi F(x, λ) = 0 ⇔ gi(x) = ci. An alternative to this method is to use the equality constraints to eliminate k of the original variables. If the problem is large and sparse, this elimination may destroy the sparsity and thus the Lagrange multipliers approach is preferable. ...

collapse

You are not currently authenticated.

If you would like to authenticate using a different subscribed institution or have your own login and password to Project MUSE

Authenticate

Least Squares Data Fitting with Applications

Appendix C: Advanced Calculus Background

Share

Additional Information

Purchase

Copyright

Project MUSE Mission