On the Distribution of Roy's Largest Root

Written by Iain Johnstone and Boaz Nadler, 2015

Matlab code companion to paper [3] available at arxiv.org/abs/1310.6581

Introduction
Rank One Alternative Hypotheses
Distribution of Roy's Test
Code
References

1. Introduction

In a seminal work [1], S.N. Roy proposed the union-intersection principle as a method to derive test statistics for a variety of hypothesis testing problems.

In various settings involving multivariate Gaussian observations, most notably testing if the population covariance matrix of given data is equal to a known matrix, the resulting test statistic is the largest eigenvalue or characteristic root of the $m\times m$ sample covariance matrix $H$ of the observed data.

In other multivariate problems, such as testing for the equality of the means of $p$ different Gaussian populations with the same but unknown covariance, the resulting test is the largest root of a random matrix of the form $E^{-1}H$.

Both of these tests are known as Roy's Largest Root Test.

Roy's Largest root complements other tests, derived by other considerations, such as the sphericity test in the case of a single covariance matrix, and Wilks Lambda, Hotelling-Lawley trace and Bartlett-Pillai trace tests, in the case of two covariance matrices, see for example [4], [5].

Deriving simple approximations to the distribution of Roy's largest root test has been an open problem in multivariate analysis. Under the null, where with Gaussian assumptions both matrices $H$ and $E$ follow Wishart distributions with identity covariance, Roy's largest root test can be well approximated by the Tracy-Widom distribution, after proper centering, scaling (and a logarithmic transformation in the two-matrix case), see [2].

2. Rank One Alternative Hypotheses

Our focus here is on the distribution of Roy's largest root test under a rank-one alternative hypothesis. In particular, we consider the following two possible settings for $H$:

Signal Detection in Noise

Here we observe $n_H$ samples ${\bf x}_i$ of the form $$ {\bf x} = \sqrt{\lambda_H} s\, {\bf v} + \sigma \xi $$ where $\xi\sim \mathcal N(0,\Sigma)$, $s\sim N(0,1)$, ${\bf v}\in\mathbb{R}^m$ is a fixed but unknown vector, $\sigma$ is the noise level, and $\lambda_H$ is the signal strength. Denote by $H$ the (un-normalized) covariance matrix of the observations, $$ H = \sum_{i} {\bf x}_i {\bf x}_i^T $$ Given the observed ${\bf x}_i$'s, a fundamental problem in statistical signal processing is signal detection. In our case, it can be formulated as the following hypothesis testing problem, $$\mathcal H_0: \lambda_H=0, \mbox{no signal}, \quad vs. \quad \mathcal H_1: \lambda_H>0, \mbox{signal present} $$ If $\Sigma$ is known, the data can be whitened, and various signal detection methods, based on the eigenvalues of $\Sigma^{-1}H$ can be devised, see for example [6] and references therein.

When the noise covariance matrix $\Sigma$ is arbitrary and unknown, it is in general not possible to detect the presence of a signal without further assumptions. A standard setting considered in [7] and [8] is that one has also collected $n_E$ noise only observations ${\bf z}_j$ of the form ${\bf z} = \sigma \xi$. One then forms an estimate of the noise covariance matrix as $n_E^{-1}E$ with $$ E = \sum_j {\bf z}_j {\bf z}_j^T $$ and tests for the presence of a signal via the eigenvalues of $E^{-1}H$. One possible such test is via the largest eigenvalue of this matrix, namely Roy's largest root.

Note that in this signal detection setting, $E\sim W_m(n_E,\sigma^2\Sigma)$ and $H\sim W_m(n_H,\sigma^2\Sigma + \lambda_H {\bf v}{\bf v}^T)$. Namely, the alternative is a rank-one perturbation from the null.

To understand the power of Roy's test we thus need to study the behavior of the largest eigenvalue of $H$ or of $E^{-1}H$ under rank-one alternatives.

MANOVA:

Here there are $p$ groups, with $n_i$ observations per group. The assumed model is that in the $k$-th group the $m$-dimensional observations are of the form $$ {\bf x} = {\bf \mu}_k + \xi$$ where ${\bf \mu}_k$ is the unknown mean of the $k$-th group, $\xi\sim N(0,\Sigma)$, where the noise covariance matrix $\Sigma$ is common to all groups.

One hypothesis testing problem is whether all group means are equal, namely $$\mathcal H_0:\ \mbox{all }{\bf \mu}_k \mbox{ are equal}\quad vs. \quad \mathcal H_1:\ \mbox{ not all group means are equal} $$

Here it is standard to construct the between-group sum of squares matrix $H$ and the within-group sum of squares matrix $E$. The matrix $E\sim W_m(n_E,\Sigma)$, where $n_E = n-p$, and $n=\sum_k n_k$ is the total number of observations in all groups.

If, under the alternative, all group means are in the same direction of some fixed vector ${\bf \mu}_0$, known as concentrated non-centrality, then the matrix $H\sim W_m(n_H,\Sigma,\Omega)$, where $\Omega$ is a rank one matrix and $n_H=p-1$.

Again, to understand the power of Roy's test in this setting, one needs to study the distribution of the largest eigenvalue, now with a non-central Wishart matrix.

3. Distribution of Roy's Test

Our main result is an approximate yet quite accurate expression for the distribution of Roy's test statistic in all the settings described above. In contrast to classical statistics, which study the behavior of the random variable of interest as sample size (and/or dimension) tends to infinity, here we keep the sample size and dimension fixed, and instead conside the limit of small noise, $\sigma\to0$.

For example, in the signal detection setting with a known noise covariance matrix we obtain the following result,

Proposition 1: Let $H\sim W_m(n_H,\sigma^2 I + \lambda_H {\bf v}{\bf v}^T)$ where $\lambda_H>0$ and $\|{\bf v}\|=1$, and let $\ell_1(H)$ denote its largest eigenvalue. Then as $\sigma\to 0$, $$ \ell_1(H) = (\lambda_H+\sigma^2)\chi^2_{n_H} + \chi^2_{m-1}\sigma^2 + \frac{\chi^2_{n_H}\chi^2_{m-1}}{(\lambda_H+\sigma^2)\chi^2_{n_H}} \sigma^4 + o_p(\sigma^4) $$ where the three variates are all independent.

Similar propositions for the two matrix case and for the case where $H$ follows a non-central Wishart are described in [3] . All of these expressions involve simple linear combinations of standard random variables ($\chi^2$ or $F$) and can be easily evaluated numerically, typically by a single numerical integration.

4. Code

Matlab code can be found here.
The file contains, among others, the following useful functions:

[cdf_R ] = cdf_Roys_Test_MANOVA(x,p,n,m,delta)

Input:
x - scalar (or array) where to compute $\Pr[\ell_1(E^{-1}H)\leq x]$.
p - number of groups
n - total number of observations in all groups
m - dimension
delta - non-centrality parameter of rank-one alternative
Output:
cdf_R = $\Pr[\ell_1(E^{-1}H)\leq x]$.
Note that the power is of course 1-cdf_R.
[H, E, Xh, Xe] = generate_data_SD(m,nH,nE,lambda,sigma)

Input:
m - dimension of data
nH- number of multivariate signal bearing observations
nE- number of noise only observations
lambda- signal strength
sigma- noise strength
Output:
H,E - (un-normalized) $m\times m$ covariance matrices of signal-bearing and noise-only observations.
Xh,Xe - signal-bearing and noise-only observations.
[H, E, X] = generate_data_MANOVA(m,p,ni,delta)

Input:
m - dimension of data
p - number of groups
ni- number of samples/group (assumed equal in all groups)
delta- non-centrality parameter for rank one matrix.
Output:
H, E- (un-normalized) $m\times m$ within and between covariance matrices.
X - the original $n\times m$ data matrix, where $n=ni\times p$ is total number of observations.

Additional routines:

pdf_Roy_EinvH_Central(x,nH,nE,m,lambda) - compute the pdf of the largest eigenvalue in the two matrix case, with central Wishart rank one perturbation (case 3 in our paper).

pdf_Roy_EinvH_NonCentral(x,nH,nE,m,delta) - the pdf of largest eigenvalue in the two matrix non-central Wishart, (case 4 in our paper)

We also provide scripts which reproduce the tables and figures in [3] . For example, sim_fig3_fig4.m reproduces figures 3 and 4 in Supplement.

5. References

S. N. Roy, On a heuristic method of test construction and its use in multivariate analysis, Ann. Math. Stat, Vol. 54, pp 220-238, 1953.
I.M. Johnstone, Multivariate analysis and Jacobi Ensembles, Annals of Statistics, vol. 36, no. 6, 2008.
I.M. Johnstone and B. Nadler Roy's Largest Root Test Under Rank-One Alternatives, 2015
R. J. Muirhead, Aspects of Multivariate Statistical Theory, Wiley, NY, 1982.
R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, sixth edition, Prentice Hall.
S. Kritchman and B. Nadler, Non-parametric Detection of the Number of Signals, Hypothesis Testing and Random Matrix Theory, IEEE Transactions on Signal Processing, vol. 57, no. 10, pp. 3930--3941, 2009.
L.C. Zhao, P.R. Krishnaiah, Z.D. Bai, On detection of the number of signals when the noise covariance is arbitrary, Journal of Multivariate Analysis, vol 20, pp. 26-49, 1986.
R. Rao Nadakuditi and J.W. Silverstein, Fundamental Limit of sample generalized eigenvalue based detection of signals in noise using relatively few signal-bearing and noise-only samples, IEEE Selected Topics in Signal Processing, vol. 4, pp. 468-480, 2010.