Most Powerful Tests 557: MATHEMATICAL STATISTICS II RESULTS FROM CLASSICAL HYPOTHESIS TESTING To construct and assess the quality of a statistical test, we consider the power function β(θ). Consider a family tests C for testing H 0 and H 1 with corresponding subsets Θ 0 and Θ 1. The uniformly most powerful (UMP) test T is the test whose power function β(θ) dominates the power function, β (θ), of any other test T C at all θ Θ 1, β(θ) β (θ) θ Θ 1. A test with power function β(θ) is unbiased if for all θ 0 Θ 0, θ 1 Θ 1, β(θ 1 ) β(θ 0 ). A simple hypothesis is one which species the distribution of the data completely. Consider a parametric model f X (x; θ) with parameter space Θ = {θ 0, θ 1 }, and the test of Then both H 0 and H 1 are simple hypotheses. H 1 : θ = θ 1 Theorem (The Neyman-Pearson Lemma) Consider a parametric model f X (x; θ) with parameter space Θ = {θ 0, θ 1 }. A test of H 1 : θ = θ 1 is required. Consider a test T with rejection region R that satises f X (x; θ 1 ) > cf X (x; θ 0 ) = x R f X (x; θ 1 ) < cf X (x; θ 0 ) = x R for some k 0, and Pr[X R θ = θ 0 ] = α. Then T is UMP in the class, C α, of tests at level α. Further, if such a test exists with k > 0, then all tests at level α also have size α (that is, α is the least upper bound of the power function β(θ)), and have rejection region identical to that of T, except perhaps if x A and Pr[X A; θ 0 ] = Pr[X A; θ 1 ] = 0. Proof As Pr[X R; θ 0 ] = α, the test T has size and level α. Consider the test function ϕ R (x) for this test, and ϕ R (x) be the test function for any other α level test, T. Denote by β(θ) and β (θ) be the power functions for these two tests. Now g(x) = (ϕ R (x) ϕ R (x))(f X (x; θ 1 ) cf X (x; θ 0 )) 0 as x R R = ϕ R (x) = ϕ R (x) = 1 g(x) = 0 x R R = ϕ R (x) = 1, ϕ R (x) = 0, f X (x; θ 1 ) > cf X (x; θ 0 ) g(x) > 0 x R R = ϕ R (x) = 0, ϕ R (x) = 1, f X (x; θ 1 ) < cf X (x; θ 0 ) g(x) > 0 x R R = ϕ R (x) = ϕ R (x) = 0 g(x) = 0. 1
Thus (ϕ R (x) ϕ R (x))(f X (x; θ 1 ) cf X (x; θ 0 )) dx 0 X but this inequality can be written in terms of the power functions as (β(θ 1 ) β (θ 1 )) k(β(θ 0 ) β (θ 0 )) 0 (1) As β(θ) and β (θ) are bounded above by α, and β(θ 0 ) = α as T is a size α, we have that β(θ 0 ) β (θ 0 ) = α β (θ 0 ) 0 β(θ 1 ) β (θ 1 ) 0 Thus β(θ 1 ) β (θ 1 ), and hence T function β is arbitrarily chosen. is UMP, as θ 1 is the only point in Θ 1, and the test with power Now consider any UMP test T C α. By the result above, T this case, if c > 0, we have from equation (1) that is UMP at level α, so β(θ 1 ) = β (θ 1 ). In β(θ 0 ) β (θ 0 ) = α β (θ 0 ) 0. But, by assumption, T is a level α test, so we also have α β (θ 0 ) 0, and hence β (θ 0 ) = α, that is, T is also a size α test. Therefore (ϕ R (x) ϕ R (x))(f X (x; θ 1 ) cf X (x; θ 0 )) dx = 0 (2) X where the integrand in equation (2) is a non-negative function. Let A be the collection of sets of probability (that is, density) zero under both f X (x; θ 0 ) and f X (x; θ 1 ), then (ϕ R (x) ϕ R (x))(f X (x; θ 1 ) cf X (x; θ 0 )) dx = 0 A A A irrespective of the nature of R, so the functions ϕ R (x) and ϕ R (x) may not be equal for x in such a set A. Apart from that specic case, the integral in equation (2) can only be zero if at least one of the two factors is identically zero for all x. The second factor cannot be identically zero for all x, as the densities must integrate to one. Thus, for all x X \ A, ϕ R (x) = ϕ R (x), and hence R satises the same conditions as R. Notes: To nd the value c that appears in the Theorem, we need to compute Pr [X R; θ 0 ] for a xed level/size α. It is possible that, for given alternative hypotheses, no UMP test exists. Also, for discrete data, it may not be possible to solve the equation Pr [X R; θ 0 ] = α for every value of α, and hence only specic values of α may be attained. The test can be reformulated in terms of the statistic λ(x) and region R λ {t R + : t > c}: λ(x) = f X(x; θ 1 ) f X (x; θ 0 ) x R λ(x) R λ If T (X) is a sufcient statistic for θ, then by the factorization theorem so that f X (x; θ 1 ) f X (x; θ 0 ) = g(t (x); θ 1)h(x) g(t (x); θ 0 )h(x) = g(t (x); θ 1) g(t (x); θ 0 ) λ(x) R λ T (x) R T say. Thus any test based on T (x) with critical region R T is a UMP α level test, and α = Pr[T (X) R T ; θ 0 ] 2
Composite Null Hypotheses Often the null and alternative hypotheses do not specify the distribution of the data completely. For example, the specication H 1 : θ θ 0 could be of interest. If, in general, a UMP test of size α is required, then its power must equal the power of the most powerful test of for all θ 1 Θ 1. H 1 : θ = θ 1 Models with a Monotone Likelihood Ratio For one class of models, nding UMP tests for composite hypotheses is possible in general. A parametric family F of probability models indexed by parameter θ Θ has a monotone likelihood ratio if for θ 2 > θ 1, and for x in the union of the supports of the two densities f X (x; θ 1 ) and f X (x; θ 2 ), is a monotone function of x. λ(x) = f X(x; θ 2 ) f X (x; θ 1 ) Theorem (Karlin-Rubin Theorem) Suppose that a test of the hypotheses H 0 : θ θ 0 H 1 : θ > θ 0 is required. Suppose that T (X) is a sufcient statistic for θ, and that f T for θ Θ has a monotone (non-decreasing) likelihood ratio, that is for θ 2 θ 1 and t 2 t 1 f T (t 2 ; θ 1 ) f T (t 1 ; θ 2 ) f T (t 1 ; θ 1 ). Then for any t 0, the test T with critical region R T dened by T (x) > t 0 = T (x) R T T (x) t 0 = T (x) R T is a UMP α level test, where α = Pr[T > t 0 ; θ 0 ]. Proof Let β(θ) be the power function of T. Now, for t 2 t 1, f T (t 2 ; θ 1 ) f T (t 1 ; θ 2 ) f T (t 1 ; θ 1 ) f T (t 1 ; θ 1 ) f T (t 1 ; θ 2 )f T (t 2 ; θ 1 ) (3) Integrating both sides with respect to t 1 on (, t 2 ), we obtain F T (t 2 ; θ 1 ) F T (t 2 ; θ 2 )f T (t 2 ; θ 1 ) 3 f T (t 2 ; θ 1 ) F T (t 2 ; θ 2 ) F T (t 2 ; θ 1 ).
Alternatively, integrating both sides of equation (3) with respect to t 2 on (t 1, ), we similarly obtain f T (t 1 ; θ 2 ) f T (t 1 ; θ 1 ) 1 F T (t 1 ; θ 2 ) 1 F T (t 1 ; θ 1 ) But setting t 1 = t 2 = t in these two inequalities yields which, on rearrangement yields 1 F T (t; θ 2 ) 1 F T (t; θ 1 ) F T (t; θ 2 ) F T (t; θ 1 ) 1 F T (t; θ 2 ) F T (t; θ 2 ) 1 F T (t; θ 1 ) F T (t; θ 1 ) F T (t; θ 2 ) F T (t; θ 1 ) (4) as F T (t; θ) is non-decreasing in t, and the function g(x) = (1 x)/x is non-increasing for 0 < x < 1. Finally, β(θ 2 ) β(θ 1 ) = Pr[T > t 0 θ 2 ] Pr[T > t 0 θ 1 ] = (1 F T (t; θ 2 )) (1 F T (t; θ 1 )) = F T (t; θ 1 ) F T (t; θ 2 ) 0 so β(θ) is non-decreasing in θ. Hence sup β(θ) = β(θ 0 ) = Pr[T > t 0 θ 0 ] = α θ θ 0 so T is an α level test. Now, let θ > θ 0, and consider the simple hypotheses Let k be dened by H0 : θ = θ 0 H1 : θ = θ. k = inf t T 0 f T (t; θ ) f T (t; θ 0 ) where T 0 = {t : t > t 0, and f T (t; θ ) > 0 or f T (t; θ 0 ) > 0}. Then T > t 0 f T (t; θ ) f T (t; θ 0 ) > k so that, by the Neyman-Pearson Lemma, T is UMP for testing H0 versus H 1 ; for any other test T of H0 at level α with power function β that satises β (θ 0 ) α, we have that β(θ ) β (θ ). But for any α level test T of H 0, we have β (θ 0 ) α. Thus taking T T, we can conclude that β(θ ) β (θ ). This inequality holds for all θ Θ 1, so T must be UMP at level α. Note: The theorem also covers the case where we are interested in hypotheses H 0 : θ θ 0 H 1 : θ < θ 0 and we have a non-increasing monotone likelihood ratio, that is for θ 2 θ 1 and t 2 t 1 f T (t 2 ; θ 1 ) f T (t 1 ; θ 2 ) f T (t 1 ; θ 1 ). 4
THE LIKELIHOOD RATIO TEST The Likelihood Ratio Test (LRT) statistic for testing H 0 against H 1 is based on the statistic H 0 : θ Θ 0 H 1 : θ Θ 1 λ X (x) = 0 sup f X (x; θ) say, where Θ Θ 0 Θ 1 ; H 0 is rejected if λ X (x) is small enough, that is, λ X (x) k for some k to be dened. Note that Θ is not necessarily the entire parameter space, just the union of Θ 0 and Θ 1. Theorem If T (X) is a sufcient statistic for θ, then λ X (x) = λ T (T (x)) = sup f T (T (x); θ) 0 sup f T (T (x); θ) x X Proof As T (X) is sufcient, for any θ 0, θ 1, f X (x ; θ 0 ) f X (x ; θ 1 ) = g(t (x); θ 0)h(x) g(t (x); θ 1 )h(x) = g(t (x); θ 0) g(t (x); θ 1 ) = f T (T (x); θ 0 ) f T (T (x); θ 1 ) by the Neyman factorization theorem, where the last equality follows as the normalizing constants in numerator and denominator are identical. Hence, at the suprema, the LRT statistics are equal. Union and Intersection Tests The construction of union and intersection tests is necessary to formulate the assessment of size and power for tests involving composite hypotheses. Suppose rst that we require a test T for the null hypothesis expressed as H 0 : θ Θ 0 where Θ 0γ, γ G are a collection of subsets of Θ. Suppose that Tγ is a test for the hypotheses H 0γ : θ Θ 0γ H 1γ : θ Θ 0γ with test statistic T γ (X) and critical region R γ. Then the rejection region for T R G Θ 0γ R γ = T rejects H 0 if x {x : T γ (x) R γ } that is, if any one of the Tγ rejects H 0γ. This test is termed a Union-Intersection Test (UIT). Suppose now that we require a test T for the null hypothesis expressed as H 0 : θ Θ 0 Θ 0γ is 5
Then, by the same logic as above, the rejection region for T is R G R γ = T rejects H 0 if x {x : T γ (x) R γ } that is, if all of the Tγ reject H 0γ. This test is termed an Intersection-Union Test (IUT). Note that if α γ is the size of the test of H 0γ, then the IUT is a level α test, where as, for each γ and for any θ Θ 0, α = sup α γ α α γ = Pr[X R γ ; θ] Pr[X R; θ] Theorem Consider testing H 0 : θ Θ 0 Θ 0γ using the global likelihood ratio statistic H 1 : θ Θ 0 λ(x) = 0 sup f X (x; θ) equipped with the usual critical region R {x : λ(x) < c}, and the collection of likelihood ratio statistics λ γ (x) 0γ λ γ (x) = Dene statistic T (x) = inf λ γ(x), and consider the critical region Then (a) T (x) λ(x) for all x. R G {x : λ γ (x) < c, some γ G} {x : T (x) < c}, (b) If β T and β λ are the power functions for the tests based on T (X) and λ(x) respectively, then β T (θ) β λ (θ) for all θ Θ (c) If the test based on λ(x) is an α level test, then the test based on T (X) is also an α level test. Proof For (a), as Θ 0 Θ 0γ, we have and thus for (b), for any θ, Hence which proves (c). λ γ (x) λ(x) for each γ T (x) = inf λ γ(x) λ(x) β T (θ) = Pr[T (X) < c; θ] Pr[λ(X) < c; θ] = β λ (θ). sup β T (θ) sup β λ (θ) α 0 0 6
P-VALUES Consider a test of hypothesis H 0 dened by region Θ 0 of the parameter space. A p-value, p(x), is a test statistic such that 0 p(x) 1 for each x. A p-value is valid if, for every θ Θ 0 and 0 α 1 Pr[p(X) α ; θ] α. That is, a valid p-value is a test statistic that produces a test at level α of the form p(x) α = x R p(x) > α = x R The most common construction of a valid p-value is given by the following theorem. Theorem Suppose that T (X) is a test statistic constructed so that a large value of T (X) supports H 1. Then the statistic p(x) given for each x X by say, is a valid p-value. p(x) = sup θ Θ 0 Pr[T (X) T (x); θ] = sup θ Θ 0 p θ (X) (5) Proof For θ Θ 0, we have p θ (x) = Pr[T (X) T (x); θ] = Pr[ T (X) T (x); θ] = F θ ( T (x)) F S (s) say, dening F S F θ as the cdf of S = T (X); clearly 0 p(x) 1 as 0 p θ (x) 1 for all θ. This recalls a result from distribution theory; if X F X, the U = F X (X) Uniform(0, 1). Suppressing the dependence on θ for convenience, dene random variable Y by Y = F θ ( T (X)) F S (S) (= p θ (X)) and let A y {s : F S (s) y}. If A y is a half-closed interval (, s y ], then F Y (y) = Pr[Y y] = Pr[F S (S) y] = Pr[S A y ] = F S (s y ) y by denition of A y, as s y A y. If A y is a half-open interval (, s y ) F Y (y) = Pr[Y y] = Pr[F S (S) y] = Pr[S A y ] = by continuity of probability. Putting the components together, for 0 α 1, Pr[p θ (X) α; θ] Pr[Y α] α But by the denition in equation (5), p(x) p θ (x), so and the result follows. Pr[p(X) α; θ] Pr[p θ (X) α; θ] α lim s s y F S (s) y 7