Least Squares Data Fitting with Applications

5. Direct Methods for Rank-Deficient Problems
Johns Hopkins University Press
Chapter
- View Citation
- Related Content
Additional Information

In lieu of an abstract, here is a brief excerpt of the content:

Chapter 5 Direct Methods for Rank-Deﬁcient Problems The methods developed so far break down when the matrix A is rank deﬁcient, i.e., for rank(A) = r ε ≥ σr+1 ≥ . . . ≥ 0. (5.1.2) In other words, given the tolerance ε, the singular values from σrε+1 onward are indistinguishable from zero. This criterion for rank determination should be used only if there is a clear gap between the consecutive singular values σrε and σrε+1, because the number rε must be robust with respect to perturbations of the singular values and the threshold. Often, one considers instead the following criterion that uses the normalized singular values and therefore is scale free: σr σ1 ≤ τ ε−1 M , where εM is the machine precision. The perturbation E of the matrix A is in general not known; it may be due to errors in the underlying model, or due to data errors, so that the elements eij have some statistical distribution (for details see [20] and [105]), or the error can be due to the numerical computations. The value for the tolerance ε should then be chosen appropriately: if the error is due to round-oﬀ we can choose ε εM A∞; if the error in the model or in the data is larger than the round-oﬀ, then choose ε 10−p A∞, when the elements of A have p correct decimal digits. It is important to stress that all the previous choices are only valid if the errors in all the elements of A are roughly of the same size, see [230]. Golub and Van Loan [105], p. 262, give two options to estimate the numerical rank if there is no clear gap in the singular values, both based on a minimization process: one option is when an accurate solution of the LSQ problem is important, and the other is used when minimizing the size of the residual is more important. Example 65. Consider the n × n matrix ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 0.1 1 0.1 1 ... ... 0.1 1 0.1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . The singular values of A in the case n = 6 are σ1 = 1.088, σ2 = 1.055, σ3 = 1.007, σ4 = 0.955, σ5 = 0.915, σ6 = 9.9 · 10−7 , i.e., the numerical rank with respect to any threshold between 10−1 and 10−6 is r(A) = n − 1, whereas the mathematical rank is n. There is a distinct gap in the singular values. 5.2 Peters-Wilkinson LU factorization The LU factorization method described in Section 4.2, can be modiﬁed to compute the pseudoinverse of a rank-deﬁcient matrix A ∈ Rm×n with rank(A) = r < min(m, n) and from it the minimum-norm solution of the least squares problem can be calculated. We recommend that this method 94 LEAST SQUARES DATA FITTING WITH APPLICATIONS be used with caution, since rank determination in an LU factorization can be risky because the singular values of A change during the elimination process; see [162] for recent results on rank-revealing LU factorizations. The algorithm uses Gaussian elimination with complete pivoting including linear independency checks. In the present case, the process will stop when there is no pivot larger than a speciﬁed tolerance. The factorization is then Π1A Π2 = LDU. Now, both L and U have trapezoidal form of width r, L = L1 L2 , U = ( U1 , U2 ), where L1 and U1 are nonsingular r × r matrices, lower and upper triangular , respectively, with unit diagonal, and D is diagonal as in Section 4.2. Complete pivoting ensures that L and U will be well conditioned and that D reﬂects cond(A). A more convenient form of Π1A Π2 is Π1A Π2 = Ir T L1DU1( Ir , S ), where T = L2L−1 1 and S = U−1 1 U2 are obtained using back-substitution on TL1 = L2 and U1S = U2. The minimum-norm solution is then: x∗ = A† b = Π2( Ir , S )† U−1 1 D−1 L−1 1 Ir T † Π1b. In [22] there is also a modiﬁcation of the Peters-Wilkinson algorithm for sparse problems. The package LUSOL [90] implements sparse LDU factorization with options for complete and rook pivoting, which also has rankrevealing properties while giving sparse factors. Rook pivoting is a strategy intermediate between full and row pivoting. At each step of elimination the pivot is chosen as one that is the largest in the row and column to which it belongs. The experimental performance of rook pivoting makes it comparable to full pivoting in precision and to partial pivoting in computational eﬃciency. 5.3 QR factorization with column permutations The methods described in this section are particularly convenient for subset selection when a basic solution is wanted. Given A and b we want to ﬁnd a permutation Π such that the ﬁrst r = rank(A) columns of A Π are linearly independent and also ﬁnd a vector z ∈ Rr that solves min z b − Arz2 , Ar = A Π Ir 0 . [18.217.158.214] Project MUSE (2024-04-25 19:03 GMT) DIRECT METHODS FOR RANK-DEFICIENT PROBLEMS 95 For a QR factorization with column permutations, orthogonal transformations such as Householder reﬂections are applied to zero out successive sub-columns, interlaced with appropriate column permutations, to place the most linearly independent columns in front. At step k (after the transformation ), the matrix has the form R(k) = QT k A Πk = R (k) 11 R (k) 12 0 A (k) 22 k m − k , with QT k = Hk · · · H1 and where Πk is the product of the permutation matrices used so far. In theory, one should be able to obtain an upper trapezoidal structured matrix after k = rank(A) steps, while in practice – due to ﬁnite precision arithmetic – we must expect that A (k) 22 = 0 for all k. At each step k there are two important questions: 1. How to select the most linearly independent columns of A (k) 22 , i.e., what permutations should be performed. 2. How to decide when to stop the process because the numerical rank has been attained and R (k) 11 contains the linearly independent information of A. To formalize the second point we introduce the concept of a rank-revealing QR (RRQR) factorization, following [40]. Intuitively this is the QR factorization of a permutation of A chosen to yield a “small” A (r) 22 , where r is the numerical rank. Deﬁnition 66. Rank-revealing QR (RRQR) factorization. Let A ∈ Rm×n have singular values σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0, with a well-deﬁned numerical rank r, i.e., with a clear gap between σr and σr+1. The factorization A Π = Q R11 R12 0 R22 r m − r , (5.3.1) with R11 upper triangular of order r, is an RRQR factorization of A if cond(R11) σ1 σr and R22 σr+1. This means that R11 contains the linearly independent information of A and R22 is a submatrix of small norm, in which case the matrix W = Π R−1 11 R12 −I (5.3.2) is a good ﬁrst approximation for the basis of the null space of A. The null vectors can be improved by a few steps of subspace iteration as shown in 96 LEAST SQUARES DATA FITTING WITH APPLICATIONS [42]. If after k steps the above two conditions of RRQR hold with r = k, the best possible factorization has been attained and the numerical rank is determined as r. Hong and Pan [129] proved that there always exists a permutation matrix Π so that A Π = QR is an RRQR factorization of A if the numerical rank is known. The question is how to obtain an economic and reliable algorithm. One can reformulate the RRQR conditions in terms of a biobjective optimization processes, i.e., the task is to ﬁnd a permutation Π that maximizes σr(R11), the smallest singular values of R11 and at the same time minimizes σ1(R22), the norm of R22. The maximization ensures that the ﬁrst r columns of Q span the range of A. The pivoted QR factorization, a column pivoting strategy devised by Businger and Golub (see [105], p. 235), fulﬁlls this requirement. At step k, the column of the rectangular matrix A (k) 22 with largest norm is permuted to be in the ﬁrst position, after which the Householder transformation that zeros all the elements of this sub-vector below the diagonal is applied. This algorithm is implemented in Linpack and Lapack and can be applied successfully in most cases if it is used with some safeguarding condition estimation (see, for example, [20], pp. 114ﬀ.). Theoretically, this process should be continued up to k = rank(A). As the rank is unknown, the stopping criterion in the Businger-Golub algorithm is A (k) 22 2 ≤ εM A2. This criterion works well in most cases, but may fail occasionally because, although a suﬃciently small A (k) 22 means that A is close to a matrix of rank k, it may happen that R (k) 11 is nearly rank deﬁcient without A (k) 22 being very small and the algorithm should have been stopped. Example 67. The following pathological example due to Kahan (see, for example, [20]) illustrates the fact that the column pivoted QR factorization algorithm can fail. Let A = diag(1, s, . . . , sn−1 ) ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 −c −c · · · −c 1 −c · · · −c ... ... −c 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . (5.3.3) Assume that n = 100, c = 0.2 and s = 0.9798. Then A is invariant under the previous algorithm and A (n−1) 22 = ann = 0.133; however, A is almost rank deﬁcient with rank(A) = n − 1, as can be seen from the singular values σn−1 = 0.1482 and σn = 0.368 · 10−8 . It was proved in [262] that σn−1 = sn−2 √ 1 + c. [18.217.158.214] Project MUSE (2024-04-25 19:03 GMT) DIRECT METHODS FOR RANK-DEFICIENT PROBLEMS 97 Several hybrid algorithms that solve the second minimization have been designed. They use the Businger-Golub algorithm in a pre-processing stage and then they compute another permutation matrix to improve on the rank-revealing aspect. The theoretical basis for most of these algorithms is the following bound from [42], on the norm of the lower right sub-block in the RRQR factorization: R222 ≤ A Π Y 2 Y −1 2 2 , Y = Y1 Y2 r × (n − r) (n − r) × (n − r) , where the linearly independent columns of Π Y are approximate null vectors of A. The norm of R22 will be small if one chooses a Y and a permutation Π such that Y2 is well conditioned. Assume, for example, that rank(A) = n−1, that the pivoting algorithm has been applied and a preliminary QR factorization has been obtained with R1 triangular. The null space of A is spanned by vn, the right singular vector corresponding to σn. An approximating vector y vn can be obtained by a couple of inverse iteration steps on RT 1 R1 (see the subsection on properties of the SVD). The permutation Π is then determined by moving the largest component in absolute value of y to the end, so that Y22 = |vn| is large. This technique is then adapted to the general case when rank(A) < n − 1, see [40] for details. An eﬃcient RRQR algorithm was developed by reﬁning the pivoted QR factorization (see [14] for a survey and description). The algorithm can be found in [263] under TOMS/782, and it starts with a pre-processing using pivoted QR, followed by interlaced column permutations with retriangularization to obtain the rank-revealing condition. A more expensive algorithm to determine the r < n most linearly independent columns of A is an SVD-based algorithm developed by Golub, Klema and Stewart [97]. It combines subset selection using SVD with the computation of the basic solution using QR factorization (see [20], pp. 113, for details). Once the RRQR factorization (5.3.1) has been computed, the basic solution xB is easily computed by multiplying b with QT 1 , followed by backsubstitution with R11, which costs a total of 4mn − 2n2 + r2 ﬂops. To compute the LSQ solution of minimal norm (following [20], p. 107), we must ﬁrst compute the matrix W in (5.3.2) and then we have x∗ = xB − W xN, xN = W† xB, in which xN is computed by solving the least squares problem minz W z − xB2 (we note that the vector W xN = W W† xB is the component of xB in the null space of A that we subtract from xB to arrive at the minimal-norm solution). The total cost of this approach is r2 (n − r) + 2r(n − r)2 + 2r2 ﬂops. 98 LEAST SQUARES DATA FITTING WITH APPLICATIONS 5.4 UTV and VSV decompositions A UTV decomposition is a slightly diﬀerent type of decomposition, where the matrix A is written as the product of three matrices, two orthogonal ones and a middle upper or lower triangular (or block triangular) matrix. These decompositions ﬁll a space somewhere between the pivoted QR factorization and the SVD. They provide orthonormal bases for the range and null space, as well as estimates for the singular values – and they are faster to compute and update. The algorithms to compute a URV decomposition start with a QR step, followed by a rank revealing stage when the singular vectors corresponding to the smaller singular values are estimated. The URV decomposition, with an upper triangular middle matrix, takes the following form: Deﬁnition 68. Given the m × n matrix A, a URV decomposition is a factorization of the form A = UR R 0 V T R = ( U URo U ) ⎛ ⎝ Rk F 0 G 0 0 ⎞ ⎠ ( VRk V ), where Rk is a k ×k nonsingular upper triangular matrix and G is (n−k)× (n−k). If A has a well-deﬁned gap between the singular values, σk+1 σk, then the URV decomposition is said to be rank revealing if σmin(Rk) = O(σk) and (F G)2 = O(σk+1). There is also a ULV decomposition, where the matrix L is lower triangular – we skip the details here and refer instead to [78]. The QR factorization with column permutations can be considered as a special URV decomposition , and the justiﬁcation is the following theorem [129]. Theorem 69. For 0 < k < n deﬁne ck = & k(n − k) + min(k, (n − k)). Then there exists a permutation matrix Π such that the pivoted QR factorization has the form A Π = Q R11 R12 0 R22 , with a k×k upper triangular R11, σk(R11) ≥ σk(A) and R222 ≤ ck σk+1(A). The normal equations can also handle rank-deﬁcient problems, as long as the numerical rank of A is well deﬁned. The starting point is a Cholesky factorization of AT A using symmetric pivoting, ΠT (AT A) Π = RT 1 R1, [18.217.158.214] Project MUSE (2024-04-25 19:03 GMT) DIRECT METHODS FOR RANK-DEFICIENT PROBLEMS 99 where Π is a permutation matrix, and R1 is the upper triangular (or trapezoidal ) Cholesky factor; the numerical properties of this algorithm are discussed by Higham [127]. The next step is to compute a ULV decomposition of the matrix E R1E, where the “exchange matrix” E consists of the columns of the identity matrix in reverse order, E R1E = UL L V T L , L = L11 0 L21 L22 , which leads to the symmetric VSV decomposition of AT A, having the form AT A = V LT L V T , V = ( V1 V2 ) = Π E VL (the orthogonal factor UL is not used). Then the minimal norm solution is given by x∗ = V1L−1 11 L−T 11 V T 1 (AT b). We refer to [124] for more details about deﬁnitions and algorithms for symmetric VSV decompositions. 5.5 Bidiagonalization We now describe a procedure based on Householder transformations of the compound matrix ( b A ) for computing a matrix bidiagonalization (see [21] for more details). The process is also used in the solution of total least squares problems. The same least squares problem structure can also be achieved with a Lanczos-type process, as will be deﬁned for use in the LSQR algorithm in the next chapter, although the present method is more stable. For simplicity, we will assume ﬁrst that A has full rank. The objective is to use Householder transformations to compute orthogonal matrices UB ∈ Rm×m and VB ∈ Rn×n , such that UT B AVB is lower bidiagonal. For convenience, instead of bidiagonalizing A and then computing UT B b, we will apply the Householder transformations to the augmented matrix ( b A ). This has the advantage that the transformed problem has a very simple right-hand side. We use a sequence of interleaved left and right Householder reﬂections, so that in step k = 1, . . . , n + 1 a left reﬂection zeros the elements in column k under the diagonal of the augmented matrix and a right reﬂection zeros the elements in row k from element (k, k + 1) onward 100 LEAST SQUARES DATA FITTING WITH APPLICATIONS of A, resulting in the decomposition UT B ( b AVB ) = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ β1 α1 0 β2 α2 ... ... βn αn βn+1 0 · · · 0 . . . . . . ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ with β = b2. With x = VBy we now obtain min x A x − b2 = min y UT B AVBy − UT B b2 = min y By − β1 e12, (5.5.1) with the (n + 1) × n matrix B = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ α1 β2 α2 ... ... βn αn βn+1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . After k < n steps of the bidiagonal reduction one has computed orthogonal matrices Uk+1 and Vk such that their columns are the ﬁrst k + 1 and k columns in UB and VB respectively. UT k AVk = Bk is the leading (k +1)×k submatrix of B. Note that for the above-described bidiagonalization, as before for the QR factorization method, no signiﬁcant additional storage is needed because the relevant information about the transformations can be stored in the empty spaces of A. The complete bidiagonal factorization applied only to A can be used as a ﬁrst step of an algorithm for calculating the SVD – see the next section. Another bidiagonalization algorithm, which is advantageous in the case that m n, was ﬁrst described in [150], and it is known as R-bidiagonalization . Instead of bidiagonalizing the original matrix A, the algorithm starts by a QR factorization and then it bidiagonalizes the resulting upper triangular matrix. In the second step the algorithm can take advantage of the special structure and uses the less expensive Givens transformations. The number of ﬂops is changed from 4mn2 − 4n3 /3 for the standard bidiagonalization to 2mn2 + 2n3 for R-bidiagonalization, so the break-even point is m = 5n/3. [18.217.158.214] Project MUSE (2024-04-25 19:03 GMT) DIRECT METHODS FOR RANK-DEFICIENT PROBLEMS 101 The bidiagonalization can be stopped prematurely if a zero element appears in B. One obtains then a so-called core least squares problem, with important properties that simplify the solution of the LS problem; see [180]. In fact, if αk+1 = 0, the least squares problem takes the form min y Bk 0 0 Ak y1 y2 − β1 e1 0 2 . Thus, it can be decomposed into two independent minimization problems: min y1 Bky1 − β1 e12 , min y2 Aky22 . The ﬁrst is the core subproblem; it has a full-rank matrix and hence a unique solution that can be computed as above via QR with Givens rotations. The minimum of the second problem is obtained for y2 = 0. A similar argument can be used to decompose the problem (5.5.1) into two separable ones, if an element βk+1 = 0. Using the fact that at each step the bidiagonal reduction produces Bk, the leading submatrix of the ﬁnal B, Paige and Saunders derived a technique called LSQR to solve general sparse rank-deﬁcient least squares problems [179]. Essentially, it interleaves QR solution steps with an iterative formulation of the reduction to bidiagonal form in order to compute a sequence of solutions yk ∈ Rk to the reduced problem min y β1 e1 − Bky2 . By applying Givens rotations, this lower bidiagonal problem can be transformed to an equivalent upper bidiagonal one, which in the LSQR case has speciﬁc computational advantages: the sequence of residual norms is decreasing. One accepts xk = Vkyk as an approximate solution of the original least squares problem if AT rk2 A2rk2 is smaller than a given tolerance. The method is also known as partial least squares (PLS); see, for example, [256], [257]. 5.6 SVD computations We conclude this chapter with a brief discussion of the Golub-ReinschKahan algorithm for computing the SVD; details can be found in [20] and [105]. The steps are 1. Bidiagonalization of A by a ﬁnite number of Householder transformations from left and right, UT B A VB = B 0 , B bidiagonal. This is the closest to a diagonal form achievable by a ﬁnite process. 102 LEAST SQUARES DATA FITTING WITH APPLICATIONS SVD (G-R-K) R-SVD To obtain Σ 4mn2 − 4n3 /3 2mn2 + 2n3 rank Σ, V , UT n b 4mn2 + 8n3 2mn2 + 11n3 LSQ solution Σ, V , Un 14mn2 + 8n3 6mn2 + 20n3 pseudoinverse Table 5.1: Comparison of algorithms for computing the SVD: the standard Golub-Reinsch-Kahan algorithm and the variant with an initial QR factorization. The matrix Un consists of the ﬁrst n columns of U. Method Used for Cost LU (Cholesky) min solution 2mn2 − 2 3 n3 QR Householder factorization 4mnr − 2r2 (m + n) + 4 3 r3 QR Householder basic solution 4mn − 2n2 + r2 QR Householder min solution r2 (n − r) + 2r(n − r)2 + 2r2 Bidiagonalization min solution 4mn2 − 4 3 n3 R-bidiagonalization min solution 2mn2 + 2n3 SVD (G-R-K) min solution 4mn2 + 8n3 R-SVD min solution 2mn2 + 11n3 Table 5.2: Comparison of some methods to solve the LSQ problem in the rank-deﬁcient case, where r is the rank of A. The LU and QR factorizations use permutations. 2. Computation of the SVD of B by an iterative procedure, which produces : UT Σ B VΣ = Σ = diag(σ1, . . . , σn). 3. Finally, the SVD of A is A = U Σ 0 V T , with U = UB UΣ 0 0 Im−n and V = VBVΣ. For the second step, an algorithm that is equivalent to an implicit-shift QR for BT B is applied directly to B in order to zero its superdiagonal elements. Basically the procedure consists of Givens rotations, alternately applied to the right and left that chase down the nonzero oﬀ-diagonal elements. The equivalence with the implicit-shift QR guarantees the convergence, which is often cubic. As mentioned in the previous section, this algorithm can be extended with a pre-processing stage that computes a QR factorization of A, applies steps 1–3 to R, and ﬁnally updates U by replacing it for Q U. Table 5.1, with information taken from [105], compares the properties and costs of the SVD algorithms with and without R-bidiagonalization. [18.217.158.214] Project MUSE (2024-04-25 19:03 GMT) DIRECT METHODS FOR RANK-DEFICIENT PROBLEMS 103 The matrices Σ and Un correspond to the economical version of the SVD. For the LSQ problem, the matrix U need not be explicitly formed because it can be applied to b as it is developed. The numbers above assume that the iterative stage 2 needs on average 2 steps per singular value. If A is nearly rank deﬁcient, this will be reﬂected in the SVD, and the relative accuracy of the smaller singular values will suﬀer. A variant of the Golub-Reinsch-Kahan SVD algorithm was described in [248]; this algorithm is called partial SVD, and it terminates the iterations in step 2, once it is guaranteed that all remaining singular values to be computed are smaller than a user-speciﬁed threshold. Table 5.2, using information from [105], compares some of the methods discussed in this chapter for matrices of rank r < min(n, m). This page intentionally left blank ...

collapse

You are not currently authenticated.

If you would like to authenticate using a different subscribed institution or have your own login and password to Project MUSE

Authenticate

Least Squares Data Fitting with Applications

5. Direct Methods for Rank-Deficient Problems

Share

Additional Information

Purchase

Copyright

Project MUSE Mission