In lieu of an abstract, here is a brief excerpt of the content:

5 Dependent Component Analysis In this chapter, we discuss the relaxation of the BSS model by taking into account additional structures in the data and dependencies between components. Many researchers have taken interest in this generalization, which is crucial for the application in real-world settings where such situations are to be expected. Here, we will consider model indeterminacies as well as actual separation algorithms. For the latter, we will employ a technique that has been the basis of one of the first ICA algorithms [46], namely, joint diagonalization (JD). It has become an important tool in ICA-based BSS and in BSS relying on second-order timedecorrelation [28]. Its task is, given a set of commuting symmetric n×n matrices Ci, to find an orthogonal matrix A such that A CiA is diagonal for all i. This generalizes eigenvalue decomposition (i = 1) and the generalized eigenvalue problem (i = 2), in which perfect factorization is always possible. Other extensions of the standard BSS model, such as including singular matrices [91] will be omitted from the discussion. 5.1 Algebraic BSS and Multidimensional Generalizations Considering the BSS model from equation (4.1)—or a more general, noisy version x(t) = As(t) + n(t)—the data can be separated only if we put additional conditions on the sources, such as the following: • They are stochastically independent: ps(s1, . . . , sn) = ps1 (s1) · · · psn (sn), • Each source is sparse (i.e. it contains a certain number of zeros or has a low p-norm for small p and fixed 2-norm) • s(t) is stationary, and for all τ, it has diagonal autocovariances E(s(t+ τ) s(t) ); here zero-mean s(t) is assumed. In the following, we will review BSS algorithms based on eigenvalue decomposition , JD, and generalizations. Thereby, one of the above conditions is denoted by the term source condition, because we do not want to specialize on a single model. The additive noise n(t) is modeled by a stationary , temporally and spatially white zero-mean process with variance σ2 . Moreover, we will not deal with the more complicated underdetermined case, so we assume that at most as many sources as sensors are 142 Chapter 5 to be extracted (i.e. n ≤ m). The signals x(t) are observed, and the goal is to recover A and s(t). Having found A, s(t) can be estimated by A† x(t), which is optimal in the maximum-likelihood sense. Here † denotes the pseudo inverse of A, which equals the inverse in the case of m = n. Thus the BSS task reduces to the estimation of the mixing matrix A, and hence, the additive noise n is often neglected (after whitening). Note that in the following we will assume that all signals are real-valued. Extensions to the complex case are straightforward. Approximate joint diagonalization Many BSS algorithms employ joint diagonalization (JD) techniques on some source condition matrices to identify the mixing matrix. Given a set of symmetric matrices C := {C1, . . . , CK}, JD implies minimizing the squared sum of the off-diagonal elements of  CiÂ, that is minimizing f(Â) := K  i=1  Ci − diag( CiÂ)2 F (5.1) with respect to the orthogonal matrix Â, where diag(C) produces a matrix, where all off-diagonal elements of C have been set to zero, and where C2 F := tr(CC ) denotes the squared Frobenius norm. A global minimum A of f is called a joint diagonalizer of C. Such a joint diagonalizer exists if and only if all elements of C commute. Algorithms for performing joint diagonalization include gradient descent on f(Â), Jacobi-like iterative construction of A by Givens rotation in two coordinates [42], an extension minimizing a logarithmic version of equation (5.1) [202], an alternating optimization scheme switching between column and diagonal optimization [292], and, more recently, a linear least-squares algorithm for diagonalization [297]. The latter three algorithms can also search for non-orthogonal matrices A. Note that in practice, minimization of the off-sums yields only an approximate joint diagonalizer—in the case of finite samples, the source condition matrices are estimates. Hence they only approximately share the same eigenstructure and do not fully commutate, so f(Â) from equation (5.1) cannot be rendered zero precisely but only approximately. [52.14.150.55] Project MUSE (2024-04-20 03:44 GMT) Dependent Component Analysis 143 Table 5.1 BSS algorithms based on joint diagonalization (centered sources are assumed) algorithm...

Share