Notes from Covariance Matching Estimation Techniques for Array Signal Processing Applications

Notes from Covariance Matching Estimation Techniques for Array Signal Processing Applications Daniel Eriksson June 13, 2011 1 COMET These notes are based on Sections 1-4 and Appendix A in [1 which describes the COMET COvariance Matching Estimation Techniques method for estimating the parameters of a signal model. The derivation of the method is based on EXIP Extended invariance principle. 1.1 Data models Here the model and the parameters that is going to be estimated are introduced. Some notations that will be used later are defined and some required assumtions are specified. Note that the parameters defining the noise covariances enters linearly and are separated from the signal parameters which enters non-linearly. The data model that is considered is where yt C m 1 and A = A C m n. It is assumed that: Emitter signals xt are random. yt = Axt + et 1 Observation vectors {yt},2,... are i.i.d. circular Gaussian random variables with zero mean, i.e. e iφ Z has the same probability distribution as Z for all real φ, see for example [3. The emitter signal xt and the noise et are uncorrelated which gives that R, µ, σ = E [yty t = R s, µ }{{} + Qσ }{{}. 2 =E[Axtx ta =E[ete t The matrices R s, µ and Qσ are linearly parameterized by µ and σ, i.e., R s, µ = Ψµ Qσ = Σσ where Σ is known, Ψ is a given function, µ are referred to emitter signal covariance parameters and σ as noise covariance parameters. 1

The operator vecc of a matrix C R n n returns a vector where all the columns in C are stacked on top of each other as c 11 c 11... c 1n. vecc = vec..... = c n1 c 12. c n1... c nn. Then 2 can be written as r, µ, σ = vec R, µ, σ = Ψµ + Σσ = Ψ Σ µ Φα 3 σ where α are the linear parameter vectors parameterizing the covariances. To ensure that the used parametrization is uniquely identifiable the following restrictions are necessary: One parameter per emitter signal, i.e. n = dimxt = dim, where n is known. Ψ = Ψ = = unique solution Φ is full rank at true values. Number of unknowns number of estimating equations available, i.e., n +n µ +n σ m 2. These restrictions above are necessary but not sufficient. Sufficient conditions for identifiability are application specific since they depend on the structure of parametrization. 1.2 Covariance Matching Estimation Technique When the model and necessary assumptions are set, a way to solve the problem is formed. Note in the solution, that the complex valued entries in R and ˆR are represented as real values by an invertible transformation. c nn Let be a vector based on the sample covariance matrix ˆr = vec ˆR 4 ˆR = 1 N yty t 5 and r = vecr. The vector ˆr has several related entries because ˆR is Hermitian where offdiagonal entries are complex valued. Form a vector ˆγ where all entries are real valued which can be described by a linear transformation ˆγ = Jvec ˆR = J ˆr 6 2

where J C m2 m 2, whose entries are 0, 1, 1 2 or ± j 2, is invertible. The matrix J is designed so that the non-diagonal elements in ˆR is formed as linear combinations to real valued entries in ˆγ as real ˆR ij = 1 2 ˆR ij + ˆR ji imag ˆR ij = 1 2j ˆR ij ˆR ji. 7 The model of the covariance matrix 2 can now be expressed as γ, α = JvecR, α = Jr, α 8 which is a real vector representing the elements in R. This representation of ˆR as a real valued vector ˆγ is an important property of the COMET method. The COMET solution is that the estimates of and α are obtained by fitting the data ˆγ to the model 8 in a weighted least square sense ˆγ γ, α T C ˆγ γ, α γ T, αc γ, α 9 where C is the inverse of the asymptotic covariance matrix of the residuals γ = ˆγ γ 10 An expression of C is required and will be derived in the next section. 1.3 COMET: Detailed Formulas The COMET estimator first estimates ˆ and then ˆα as a function of ˆ. The estimated parameters are shown to be real valued and optimal. An important note is that the approach when deriving COMET is the approximation, based on EXIP, by replacing R by ˆR. Note that the method requires a sufficiently large number of samples N to have positive definite covariance matrices ˆR and ˆQ. To find an expression for C, the second-order moments of the covariance estimates are determined. The ith m 1 subvector ˆr i of the m 2 1 vector ˆr is given by ˆr i = 1 N ytyi t. 11 Since the signals yt are Independent from snapshot to snapshot circularly Gaussian distributed. 3

the following relation is established [ E[ˆr iˆr j 1 = E ytyi 1 t ysyj s = N N s=1 [ = 1 N E N [ ytyi t ysyj s = 1 E N ytyi t ysyj s = s=1 s=1 = 1 E [ytyi ty j sy s = = 1 + 1 s=1 s=1 s=1 E [ytyi ty j sy s E [ytyi t E [y j sy s + E [ytyi t E [y j sy s = } {{ } r i rj = 1 s=1 E [yty i ty j sy s E [yty i t E [y j sy s + r i r j. 12 Consider the first term in the last equality. All expectations where t s are zero since they are independent. For t = s, the element p, k of E [yty i ty jty t can be written as where t is omitted E [y p y i y j y k = E [y py i E [y j y k + E [y py j E [y i y k + E [y py k E [y i y j = E [yy i y j y = E [yy i E [y j y + 0 + R ji R 13 The second term is zero since the observations are circularly Gaussian, i.e. E[y p y j = E [y i y k = 0. Using the results 13 in 12 gives 1 E [yyi y j y E [yyi E [y j y + r i rj = = 1 = 1 E [yyi E [y j y + R ji R E [yyi E [y j y + r i rj = R ji R + r i rj = 1 N R jir + r i rj 14 Now writing the covariance matrix of the error of the estimation by using 14 gives E [ˆr i r i ˆr j r j = E [ˆr iˆr j ˆr i r j r iˆr j r i r j = = E [ˆr iˆr j E [ˆri r j r i E [ˆr j ri r j = /E [ˆr = r/ = = E [ˆr iˆr j ri r j r i r j + r i r j = E [ˆr iˆr j ri r j = 1 N R jir 15 which gives that C E [ˆr rˆr r = 1 N RT R. 16 4

This is used to define C as C E [ˆγ γˆγ γ = E [Jˆr rˆr r J = 1 N JRT RJ = J CJ. 17 This shows how a consistent estimate of C can be formed from the data. By normalizing 9 it can be rewritten as 1 N γt, αc γ, α = 1 N r, αj C J r, α = = 1 N r, αj J CJ J r, α = 1 N r, α C r, α = = r, αr T R r, α The following result is not important for the derivation of COMET. By using the properties: 18 B T AvecX = vecaxb B A = B A 19 and that R = ˆR R, the cost function can be simplified as r R T R r = r R T R vec R = r R T R vec R = r vecr RR. 20 This can be further simplified using that veca vecb = tra B which gives r vecr RR = vec R vecr RR = tr R R RR. 21 By using that ˆR is a consistent estimate of R, according to EXIP, a large-sample ML-estimator of and α is obtained by minimizing the cost function 18 where R is replaced by ˆR as r, α ˆR T ˆR r, α = r, αŵ r, α 22 where Ŵ = ˆR T ˆR. The computed cost function in 21 and 22 is sometimes referred to as the generalized least square criterion and is a large sample approximation to the ML criterion. Remember that r, α is a linear function of α and that Φ have full column rank. Then and α can be uniquely determined from a given R. First ˆα can be determined as a function of min α r, αŵ r, α = min Ŵ /2ˆr Ŵ /2 Φα 2 α = ˆα = Φ Ŵ Φ Φ Ŵ ˆr 23 where denotes the Euclidian vector norm and Ŵ /2 is a Hermitian square root factor of Ŵ. Optimality of the estimated vector ˆα can only be ensured if it is real valued which will be shown here. Since J is invertible, 23 can be written as ˆα = JΦ JŴ J JΦ JΦ JŴ J J ˆr 24 5

where J ˆr = ˆγ and JΦ are real since Jr = JΦα is real if α is real-valued implies also that JΦ is real-valued. Consider a sequence of random numbers [zt that are circularly Gaussian distributed with mean zero and covarance matrix ˆR which is fixed. Let ˆρ = vec N ztz t/n. Corresponding to ˆr, the previous calculations, leading to 6 and 18 implies that J ˆρ is real valued and also the covariance of the vector J ˆR T ˆRJ. This result assures that JŴ J in 24 is also real valued and thus ˆα is real-valued and optimality is ensured. Now ˆ can be computed. Substitute α in with the estimated ˆα in 23 which gives minˆr Φα Ŵ ˆr Φα = α = ˆr Φ Φ Ŵ Φ Φ Ŵ ˆr Ŵ = Ŵ /2 I Φ Φ Ŵ Φ Φ Ŵ ˆr Ŵ /2 I Φ Φ Ŵ Φ Φ Ŵ ˆr = min α ˆr Φα Ŵ ˆr Φα 25 ˆr Φ = I Ŵ /2 Φ Φ Ŵ Φ Φ Ŵ /2 Ŵ /2ˆr 2 = = ˆr Ŵ I /2 Ŵ /2 Φ Φ Ŵ Φ Φ Ŵ /2 Ŵ /2ˆr Φ Ŵ Φ Φ Ŵ ˆr = 26 where the last equality depends on that I Ŵ /2 Φ Φ Ŵ Φ Φ Ŵ /2 is a projection matrix. This can be summarized as where ˆ = arg min ˆr Ŵ /2 Π Ŵ /2 ΦŴ /2ˆr 27 Π Ŵ /2 Φ = I Π Ŵ /2 Φ = I Ŵ /2 Φ This can be further simplified using that Φ Ŵ Φ Φ Ŵ /2. 28 Ŵ /2ˆr = ˆR T/2 ˆR /2 vec ˆR = vec ˆR /2 ˆR ˆR/2 = veci Ŵ /2 veci = ˆR T/2 ˆR /2 veci = vec ˆR which results in the cost function whose minimizer yields the COMET estimate of the signal parameter vector ˆ = arg max ˆr Ŵ /2 ΠŴ /2 ΦŴ /2ˆr = vec IΠŴ /2 Φ veci = = arg max = arg max vec IŴ /2 Φ = arg max vec ˆR Φ Φ Ŵ Φ Φ Ŵ /2 veci = Φ Ŵ Φ Φ vec ˆR where maximizing ΠŴ /2 Φ corresponds to minimizing Π. This is solved by using a Ŵ /2 Φ Newton-type method. Once ˆ is obtained, this is used to compute ˆσ and ˆµ in 24. 6 29 30

1.4 Statistical Analysis and CRB Focus is on analyzing ˆ. The analysis of the COMET estimation shows that the covariance equals the CRB. Important is that the number of samples N needs to be sufficiently large for ˆR s and ˆQ to be positive definite. There are several long derivations in this section but in the end it will show that they can be simplifed to a simple formula for the CRB. The CRB of the parameter ˆ is derived using the compact form of the COMET cost function 27. First consistency of ˆ is considered. As N, ˆ converges to the gloabal minima of asymptotic function corresponding to 30. Using that Ŵ W as N, P a 2 = a T P a if P projection matrix 31 then 27 can be written as min ˆr Ŵ /2 Π Ŵ /2 ΦŴ /2ˆr = min Π Ŵ /2 ΦŴ /2ˆr 2 = 32 = Π W /2 Φ 0 W /2 r 0 2 = Π W /2 Φ 0 W /2 Φ 0 α 0 2 where 0, α 0 denote the true values. The consistency of ˆ is given by the unique solution assumption. Here follows the derivation of covˆ = CRB. The COMET estimate is given by de minimizing argument of 27 f = ˆr Ŵ /2 Π Ŵ /2 ΦŴ /2ˆr. 33 Then a Taylor expansion around the derivative of the cost function around ˆ gives where 0 = f lim N 2 f f = 0 + lim ˆ 2 f 0 = lim N 2 2 N 2 f 2 =0 ˆ 0 f =0 = 0 = H g =0 > 0 is invertible in a neighbourhood of = 0. To derive 33 the following rule is used: Assume that PX is projection matrix spanning the null space of X where X = X. Then the derivative of PX can be derived, for example in [2, 34 7

as PX = I XX X X = XX X X = X = X X X + X X X X + XX X X = / = X X = X / X X X = X = X X X XX X dx X X X X + XX X X = d X = X X X XX X X dx X X X + + XX dx X XX X X + XX X X = = I XX X X dx X X X + XX X X I XX X X = = PX X X X X + XX X X P X = = PX X X X X + PX X X X X 35 With this result the derivative of 33 can be derived as f = ˆr Ŵ /2 Π Ŵ /2 ΦŴ /2ˆr = ˆr Ŵ /2 Π Ŵ /2 Φ Ŵ /2ˆr = = ˆr Ŵ /2 Π /2 Φ Ŵ /2 ΦŴ Φ Ŵ Φ Φ Ŵ /2 +... Ŵ /2ˆr where... denotes the term equal to the conjugate transpose of the previous term. Using that AXB + AXB = 2R AXB = 2R B X A 37 36 and 24, then 36 can be written as ˆr Ŵ /2 Π /2 Φ Ŵ /2 ΦŴ Φ Ŵ Φ Φ Ŵ /2 +... Ŵ /2ˆr = = 2R ˆr Ŵ /2 Π /2 Φ Ŵ /2 ΦŴ ˆα = 2ˆr Ŵ /2 Π /2 Φ Ŵ /2 ΦŴ ˆα 38 8

It can easily be shown that f is real-valued by writing 38 as f 2ˆr Ŵ /2 Π /2 Φ Ŵ /2 ΦŴ ˆα = = 2ˆr Ŵ I /2 Ŵ /2 Φ Φ Ŵ Φ Φ Ŵ /2 = 2ˆr Ŵ Φ ˆα + 2ˆr Ŵ Φ Φ Ŵ Φ Φ Ŵ = 2ˆr J JŴ J J Φ ˆα+ + 2ˆr J JŴ J JΦ Φ J JŴ J JΦ Ŵ /2 Φ ˆα = Φ ˆα = Φ J JŴ J J Φ ˆα 39 where all terms are shown to be real as when ˆα was shown to be real. The term J Φ is real since JΦ is real. d The first steps of the computation of the Hessian lim 2 N k p f is similar to 38. Replace Ŵ by W, ˆα by α, ˆr = r = Φ 0α and Φ by Φ k and then derivate the transpose conjugate of 38, which is equal according to 37, with respect to k lim 2α Φ W /2 Π N p W /2 Φ W /2 r = k = 2α d2 Φ W /2 Π k W /2 Φ W /2ˆr 2α Φ W /2 Π W /2 r = p k W /2 Φ p = 2α d2 Φ W /2 Π k W /2 Φ W /2ˆr + 2α Φ W /2 Π /2 Φ W α+ p W /2 Φ k p Φ W Φ Φ W /2 Π W /2 Φ W /2 r + 2α Φ k W /2 /2 Φ Ŵ which, when evaluated at = 0, r = Φ 0 α gives that the first and last term is zero and thus the Hessian can be written as lim N d 2 k p f = lim N 2 f = 2α Φ k In [1 it is shown that 39 can be simplified since 40 W /2 Π /2 Φ W α. 41 W /2 Φ p ˆr Ŵ /2 Π Ŵ /2 ΦŴ /2 = ˆr rŵ /2 Π Ŵ /2 ΦŴ /2 = = ˆr rw /2 Π W /2 Φ W /2 + o 1 N = O 1 N. 42 which gives that f 2ˆr r W /2 Π /2 Φ W W /2 Φ α = = 2α Φ W /2 Π W /2 Φ W /2 ˆr r. 43 9

For the following computation of CRB observe that lim E [ f f = N [ = E 4α Φ W /2 Π W /2 Φ W /2 ˆr rˆr r W /2 Π W /2 Φ k /2 Φ W α = p = 4α Φ W /2 Π W /2 Φ W /2 E [ˆr rˆr r W /2 Π /2 Φ W α = W /2 Φ k p = 4α Φ W /2 Π W /2 Φ W /2 1 k N W W /2 Π /2 Φ W α = W /2 Φ p = 4 1 N α Φ W /2 Π W /2 Φ Π /2 Φ W α = W /2 Φ k p = 4 1 N α Φ W /2 Π /2 Φ W α W /2 Φ k p 44 CRB can be computed as, see [4, CRB = 2 f 0 E [ f0 f 0 2 f 0 45 and CRB = N 2 f 0 NE [ f 0 f 0 2 f 0 46 where the element [CRB kp is written as [CRB kp = N 2 f NE k p [ f k = N2α Φ W /2 Π W /2 Φ k 2α Φ W /2 Π W /2 Φ k When Φ p α is evaluated at α 0 then which gives the CRB f 2 f = p k p /2 Φ W α p /2 Φ W p 4α Φ W /2 Π W /2 Φ k /2 Φ W α p α = Nα Φ W /2 Π /2 Φ W α. W /2 Φ k p 47 Φ α = r = vecr 48 p p p vecr CRB = N W /2 Π /2 vecr W W /2 Φ 49 The asymptotical distribution of ˆ is Gaussian and the covariance matrix equals the CRB. It is worth noting that it requires that N is sufficiently large for ˆR and ˆQ to be positive definite. References [1 B. Ottersten and P. Stoica and R. Roy, Covariance Matching Estimation Techniques for Array Signal Processing Applications, Digital Signal Processing, 1998. 10

[2 T. V. K. Chaitanya, [PDF Notes for Asymptotic Analysis Examples, http://www.commsys.isy.liu.se/ade/chaitanya-notes.pdf, 2011. [3 R. G. Gallager, [PDF Circularly-Symmetric Gaussian random vectors, www.rle.mit.edu/rgallager/documents/circsymgauss.pdf, 2008 [4 E. G. Larsson, [PDF Key concepts in asymptotic analysis, http://www.commsys.isy.liu.se/ade/notes-asymptotic-analysis.pdf, 2011 11