On Independent Reference Priors

Transcript

1 On Independent Reference Priors Mi Hyun Lee Dissertation submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Statistics Dongchu Sun Chairman Eric P. Smith Co-Chairman George R. Terrell Pang Du December Blacksburg Virginia Keywords: Independent reference priors Reference priors Probability matching priors Copyright 007 Mi Hyun Lee

2 On Independent Reference Priors Mi Hyun Lee (ABSTRACT) In Bayesian inference the choice of prior has been of great interest. Subjective priors are ideal if sufficient information on priors is available. However in practice we cannot collect enough information on priors. Then objective priors are a good substitute for subjective priors. In this dissertation an independent reference prior based on a class of objective priors is examined. It is a reference prior derived by assuming that the parameters are independent. The independent reference prior introduced by Sun and Berger (998) is extended and generalized. We provide an iterative algorithm to derive the general independent reference prior. We also propose a sufficient condition under which a closed form of the independent reference prior is derived without going through the iterations in the iterative algorithm. The independent reference prior is then shown to be useful in respect of the invariance and the first order matching property. It is proven that the independent reference prior is invariant under a type of one-to-one transformation of the parameters. It is also seen that the independent reference prior is a first order probability matching prior under a sufficient condition. We derive the independent reference priors for various examples. It is observed that they are first order matching priors and the reference priors in most of the examples. We also study an independent reference prior in some types of non-regular cases considered by Ghosal (997).

3 This dissertation is dedicated to the memory of my father in heaven iii

4 Acknowledgments This dissertation would never have been possible without the expert guidance of my committee members and support from my family and friends. I wish to express my sincere gratitude to my esteemed advisor Dr. Dongchu Sun for persevering with me throughout the time it took me to complete this dissertation. I would also like to thank Dr. Eric P. Smith who advised and supported me as my co-advisor. My thanks go out to Dr. George R. Terrell and Dr. Pang Du for their contribution. Many people on the faculty and staff of the department of Statistics assisted and encouraged me in various ways. I also thank them. I am grateful for encouragement and support from my mother and late father. I must acknowledge as well my friends who were always cheering me up. iv

5 Contents Introduction. Overview Literature Review Constant Priors Jeffreys Priors Reference Priors Independent Reference Priors Probability Matching Priors Non-regular Cases Outline Main Results for Independent Reference Priors 5. Notation Independent Reference Priors First Order Matching Priors v

6 3 Examples 4 3. Binomial Model: Two Independent Samples Two Binomial Proportions Bivariate Binomial Model Two Binomial Proportions with Pre-screen Test Case I Case II Exponential Model: Two Independent Samples Gamma Model Inverse Gaussian Model Lognormal Model Normal Model Normal Model: Two Independent Samples Unequal Variances Equal Variances Behrens-Fisher Problem Fieller-Creasy Problem Bivariate Normal Model Commonly Used Parameters Cholesky Decomposition vi

7 3.0.3 Orthogonal Parameterizations Poisson Model: Two Independent Samples Weibull Model Weibull Model: Two Independent Samples with the Same Shape Parameter One-way Random Effects ANOVA Model Two-parameter Exponential Family Proper Two-parameter Dispersion Model Typical Examples Student-t Regression Model Zero-inflated Poisson (ZIP) Model Non-regular Cases Setup Examples Location-scale Family Truncated Weibull Model Summary and Future Work 0 Bibliography 04 Vita 08 vii

8 List of Figures 3. Diagram for Case I Diagram for Case II viii

9 List of Tables 3. Mean Squared Errors Frequentist Coverage Probabilities of One-sided 5% Posterior Credible Interval for δ Frequentist Coverage Probabilities of One-sided 50% Posterior Credible Interval for δ Frequentist Coverage Probabilities of One-sided 95% Posterior Credible Interval for δ ix

10 Chapter Introduction. Overview In Bayesian inference the selection of prior has been of great interest and various kinds of priors have been proposed. There are two categories of priors based on the amount of information on priors that we could have which are subjective priors and objective priors (or noninformative priors). If sufficient information on priors is available subjective priors could be a good choice. Unfortunately in practice we might not often collect enough information. Then noninformative priors or objective priors which are derived only by using the assumed model and the available data can be used as a substitute for subjective priors. Thus the use of noninformative or objective priors has increased in Bayesian analysis. Many kinds of noninformative priors have been developed: constant priors [Laplace (8)] Jeffreys priors [Jeffreys (96)] reference priors [Bernardo (979) Berger and Bernardo (99)] independent reference priors [Sun and Berger (998)] probability matching priors [Datta and Mukerjee (004)] and noninformative priors in non-regular cases

11 Mi Hyun Lee Chapter. Introduction [Ghosal and Samanta (997) Ghosal (997) Ghosal (999)]. We review them precisely in Section.. We study an independent reference prior which originated in Sun and Berger (998). It is a reference prior derived with the assumption of the independence of the parameters. In many practical problems we can obtain partial information on priors such as the independence of the parameters. Then independent reference priors could be used for such situations. In this dissertation the independent reference prior introduced by Sun and Berger (998) is extended and generalized. We consider multiple groups of parameters while Sun and Berger (998) used two groups of parameters. An iterative algorithm to compute the general independent reference prior is proposed. Then a mild sufficient condition to make an inference on the result of the iterative algorithm without going through the iterations is also provided. The independent reference prior holds the invariance and the first order matching property. We prove that our independent reference prior is invariant under a type of one-to-one reparameterization where the Jacobian matrix is diagonal. A sufficient condition under which the independent reference prior is a first order matching prior is given. Then the independent reference priors are derived in numerous examples. It turns out that they are first matching priors and the reference priors in most of the examples. Additionally we present an iterative algorithm to obtain an independent reference prior in some types of non-regular cases where the support of the data is either monotonically increasing or decreasing in a non-regular type parameter. It is verified that the independent reference prior is a first order matching prior under a sufficient condition. Some examples are also given.

12 Mi Hyun Lee Chapter. Introduction 3. Literature Review The history of objective priors is described in this section. Constant priors [Section..] Jeffreys priors [Section..] reference priors [Section..3] independent reference priors [Section..4] Probability matching priors [Section..5] and objective priors in nonregular cases [Section..6] are reviewed... Constant Priors Objective priors began with a constant prior (or a flat prior) which is just proportional to. Laplace (8) employed it for Bayesian analysis. The constant prior is very simple and easy to use. However it is not invariant to one-to-one transformations of the parameters... Jeffreys Priors Jeffreys (96) proposed a rule for deriving a prior which is invariant to any one-to-one reparameterization. It is called a Jeffreys-rule prior which is still one of the popular objective priors. The Jeffreys-rule prior is proportional to the positive square root of the determinant of the Fisher information matrix defined as (.). The Fisher information is a measure of the amount of information about the parameters provided by the data from model. Datta and Ghosh (996) pointed out that the Jeffreys-rule prior performs satisfactorily in oneparameter cases but poorly in multi-parameter cases. An inconsistent Bayes estimator or an unreasonable posterior were produced in some of multi-parameter examples. Thus the use of the Jeffreys-rule prior is somewhat controversial in multi-parameter cases. Jeffreys (96) recommended an independence Jeffreys prior which could modify the deficiencies of the Jeffreys-rule prior in multi-parameter cases. It is the product of the Jeffreys-rule priors

13 Mi Hyun Lee Chapter. Introduction 4 for each group of parameters when the other groups of parameters are held fixed...3 Reference Priors Bernardo (979) introduced a reference prior which fixes the deficiencies of the Jeffreys-rule prior in multi-parameter problems. The ad hoc modifications which are required for the Jeffreys-rule prior in multi-parameter situations are not necessary for the reference prior. Bernardo (979) separated the parameters into the parameters of interest and nuisance parameters and considered the parameters sequentially in the process of deriving a reference prior. Then a reference prior is more successful in multi-parameter cases. A reference prior is defined as a prior which maximizes asymptotically the expected information provided by the data from model about the parameters which is the same as the expected Kullback-Leibler divergence between the posterior and prior. Then the reference prior has minimal influence since the data has maximal influence on the inference. Bernardo (979) just introduced the basic idea of reference priors and posteriors without the mathematical details for their construction. The idea of Bernardo (979) was broadened and generalized by Berger and Bernardo (99). They divided the parameters into two or more groups according to their order of inferential importance. They provided an in-depth description of mathematical methods to derive a reference prior. Now the reference prior method is described in detail. Let us start with the notation that is necessary to explain the method. Consider a parametric family of distributions whose density is given by f(x; θ) for the data X X where θ Θ IR p is a p-dimensional

14 Mi Hyun Lee Chapter. Introduction 5 unknown parameter vector which can be decomposed into m sub-groups θ = (θ... θ m ). Here θ i = (θ i... θ ipi ) Θ i IR p i Θ = Θ Θ m with p + + p m = p. We define the Fisher information matrix of θ Σ(θ) = E θ [ θ i θ j log f(x; θ) ] i j =... m (.) where E θ denotes expectation over X given θ. We will often write Σ instead of Σ(θ). Also define for j =... m θ [j] = (θ... θ j ) θ [ j] = (θ j+... θ m ) where θ [ 0] = θ and θ [0] is vacuous. Let Z t = {X... X t } be the random variable that would arise from t conditionally independent replications of the original experiment. Then Z t has density t p(z t θ) = f(x i ; θ). (.) i= First we see how to develop a reference prior for regular cases in the sense that p(z t θ) given by (.) is asymptotically normally distributed. Assume that Σ is invertible and let S = Σ. Write S as S = A A t A t m A A A t m A m A m A mm

15 Mi Hyun Lee Chapter. Introduction 6 so that A ij is p i p j and define S j to be the upper left ( j k= p k) ( j k= p k) corner of S with S m S and H j Sj. Then the matrices h j defined to be the lower right p j p j corner of H j j =... m will be of central importance. Note that h H A and if S is a block diagonal matrix that is A ij = 0 for all i j then h j A jj j =... m. Finally if Θ Θ we define Θ (θ [j] ) = {θ j+ : (θ [j] θ j+ θ [ j+] ) Θ for some θ [ j+] }. A denotes the determinant of A and Ω (y) equals if y Ω 0 otherwise. The reference prior method for regular cases can be described in four steps.. Choose a nested sequence Θ Θ of compact subsets of Θ such that l= Θ l = Θ. This step is not necessary if the reference priors turn out to be proper.. Order the coordinates (θ... θ m ). Usually the order should typically be according to inferential importance; in particular the first group of parameters should be of interest. Note that (θ... θ m ) is assumed to be ordered for convenience of notation. 3. To start define π l m(θ [ m ] θ [m ] ) = π l m(θ m θ [m ] ) = h m(θ) / Θ l (θ [m ] )(θ m ) Θ l (θ [m ] ) h m(θ) / dθ m. For j = m... define πj(θ l [ j ] θ [j ] ) = πl j+(θ [ j] θ [j] ) exp { El j[(log h j (θ) ) θ [j] ] } Θ l (θ [j ] )(θ j ) Θ l (θ [j ] ) exp { El j[(log h j (θ) ) θ [j] ] } dθ j where Ej[g(θ) θ l [j] ] = {θ [ j] :(θ [j] θ [ j] ) Θ l } g(θ)π l j+(θ [ j] θ [j] )dθ [ j].

16 Mi Hyun Lee Chapter. Introduction 7 For j = write π l (θ) = π l (θ [ 0] θ [0] ). 4. Define a reference prior π(θ) as any prior for which E X l D(π l (θ X) π(θ X)) 0 as l where the Kullback-Leibler divergence between two densities g and h on Θ is denoted by and E X l D(g h) = is expectation with respect to Θ p l (x) = Θ g(θ) log [ ] g(θ) dθ h(θ) f(x; θ)π l (θ)dθ. Typically π(θ) is determined by the simple relation where θ is an interior point of Θ. π(θ) = lim l π l (θ) π l (θ ) Definitely a reference prior depends on the grouping and the ordering of the parameters. Thus Berger and Bernardo (99) recommended deriving a reference prior by considering one parameter per group in Step. We call such a reference prior a one-at-a-time reference prior. However one-at-a-time reference priors still depend on the order of inferential importance of the parameters. Note that it can be easily shown that a reference prior is equivalent to the Jeffreys-rule prior in one-parameter cases. Datta and Ghosh (996) provided another expression for h j (θ) j =... m. Write the Fisher information matrix of θ in partitioned form as Σ = ( (Σ ij ) ) i j =... m.

17 Mi Hyun Lee Chapter. Introduction 8 Also write for j = 0... m Σ [ jj] = ( (Σ ik ) ) i k = j +... m. Then where Σ [ mm] =. h j (θ) = Σ [ j j ] j =... m Σ [ jj] Next the reference prior method for non-regular cases which was proposed by Berger and Bernardo (99) is shown. Only Step 3 is different from the regular cases. Thus we just describe Step For j = m m... iteratively compute densities π l j(θ [ j ] θ [j ] ) π l j+(θ [ j] θ [j] )h l j(θ j θ [j ] ) where π l m+ and h l j is computed by the following two steps. 3 a: Define p t (θ j θ [j ] ) as { p t (θ j θ [j ] ) exp p(z t θ [j] ) log p(θ j z t θ [j ] )dz t } (.3) where p(z t θ [j] ) = p(z t θ)π l j+(θ [ j] θ [j] )dθ [ j] p(θ j z t θ [j ] ) p(z t θ [j] )p t (θ j θ [j ] ). 3 b: Assuming the limit exists define For j = write h l j(θ j θ [j ] ) = lim t p t (θ j θ [j ] ). (.4) π l (θ) = π l (θ [ 0] θ [0] ).

18 Mi Hyun Lee Chapter. Introduction 9 Berger and Bernardo (99) pointed out that in practice it is very hard to compute the p t given by (.3) and to find their limit in (.4). Berger (99) [attributed to Ghosh and Mukerjee (99)] introduced a reverse reference prior which is obtained by reversing the roles of the interest parameters and nuisance parameters when deriving a reference prior in order to satisfy the probability matching criterion when the parameters are orthogonal. We explain the probability matching criterion in Section..5. According to Datta and Ghosh (996) the invariance of the reference prior is valid under a type of one-to-one reparameterization where the Jacobian matrix is upper triangular. However the reverse reference prior does not remain invariant to any one-to-one reparameterization. Datta and M. Ghosh (995) compared reference priors and reverse reference priors. They provided a sufficient condition under which the two priors agree...4 Independent Reference Priors Sun and Berger (998) derived conditional reference priors when partial information is available. They considered three situations. When a subjective conditional prior density is given two methods to find a marginal reference prior were described. When a subjective marginal prior is known a conditional reference prior was proposed. When two groups of parameters are assumed to be independent an independent reference prior was defined. The independent reference prior is our main focus in this dissertation. In most examples of Bayesian inference the reference priors are expressed as the product of marginal reference priors. If we have information on the independence of the groups of parameters we can surely use an independent reference prior which does not depend on the order of inferential

19 Mi Hyun Lee Chapter. Introduction 0 importance of the groups of parameters instead of a reference prior which depends on it. Assuming the independence of two groups of parameters θ and θ Sun and Berger (998) suggested the following iterative algorithm to derive an independent reference prior. Note that Σ = Σ(θ θ ) is the Fisher information matrix of (θ θ ) Σ = Σ (θ θ ) is the Fisher information matrix of θ given that θ is held fixed and Σ = Σ (θ θ ) is the Fisher information matrix of θ given that θ is held fixed. Algorithm A: Step 0. Choose any initial nonzero marginal prior density for θ π (0) (θ ) say. Step. Define an interim prior density for θ by { π () (θ ) exp Step. Define an interim prior density for θ by { π () (θ ) exp ( ) } π (0) (θ ) log dθ. Σ ( ) } π () (θ ) log dθ. Σ Replace π (0) in Step 0 by π () and repeat Step and to obtain π () and π (). Consequently we generate two sequences {π (l) } l and {π (l) } l. The desired marginal reference priors will be the limits π R i = lim π (l) i i = l if the limits exist. Sun and Berger (998) pointed out that in applying the iterative algorithm it may be necessary to operate on compact sets and then let the sets grow as the reference prior method. They also established a sufficient condition under which the result of the algorithm is inferred without going through the iterations.

20 Mi Hyun Lee Chapter. Introduction..5 Probability Matching Priors The concept of probability matching priors are quite different from the previous objective priors. Welch and Peers (963) introduced the basic idea of probability matching priors. Datta and Mukerjee (004) summarized and discussed various probability matching priors. The priors satisfying the criterion that the frequentist coverage probabilities of Bayesian credible sets agree asymptotically to the Bayesian coverage probabilities of the credible sets up to a certain order are defined as probability matching priors or simply matching priors. In other words the difference between the frequentist confidence sets and the Bayesian credible sets should be small in an asymptotic way. There are several matching criteria. For example the matching might be carried out through posterior quantiles distribution functions highest posterior density regions inversion of certain test statistics or prediction. For each matching criterion the differential equations which matching priors must satisfy were derived. Matching priors for posterior quantiles are most popular. First and second order matching priors are widely used for posterior quantile matching priors. We consider only a first order matching prior in this dissertation. The differential equation which a first order matching prior must satisfy was introduced by Datta and J. K. Ghosh (995) and revisited by Datta and Mukerjee (004). Referring to Chapter of Datta and Mukerjee (004) matching priors for posterior quantiles are defined as follows. Consider priors π( ) for which the relation P θ {ψ ψ ( α) (π X)} = α + o(n r/ ) (.5) holds for r =... and for each α (0 ). n is the sample size θ = (θ... θ m ) where θ i Θ i IR is an unknown parameter vector ψ = ψ(θ) is the one-dimensional parametric function of interest P θ { } is the frequentist probability measure under θ and ψ ( α) (π X)

21 Mi Hyun Lee Chapter. Introduction is the ( α) th posterior quantile of ψ under π( ) given the data X. Then priors satisfying (.5) for r = are called first order matching priors for ψ. First order matching priors π M for ψ must satisfy the following differential equation where (ξ π) + + (ξ m π) = 0 (.6) θ θ m ξ = (ξ... ξ m ) = Σ ψ ψ Σ ψ (.7) with ψ = ( θ ψ... θ m ψ ) and Σ is the Fisher information matrix of θ = (θ... θ m ). By Welch and Peers (963) the Jeffreys-rule prior is a first order matching prior in one-dimensional parameter cases. Thus a reference prior is also a first order matching prior in one-parameter cases. Remember that a reverse reference prior was introduced by Berger (99) to meet the matching criterion under orthogonality. Datta and M. Ghosh (995) reaffirmed that a reverse reference prior must be a matching prior under orthogonal parameterizations but a reference prior does not need to be even under orthogonality. By Datta and Ghosh (996) a probability matching prior was shown to be invariant under any one-to-one reparameterization...6 Non-regular Cases The concept and algorithm for reference priors for non-regular cases were proposed by Bernardo (979) and Berger and Bernardo (99). It is however hard to apply in practice. See Section..3 for details. Ghosal and Samanta (997) obtained a reference prior in one-parameter non-regular cases such as a one-parameter family of discontinuous densities where the support of the data is

22 Mi Hyun Lee Chapter. Introduction 3 either monotonically decreasing or increasing in the parameter. They derived a reference prior by maximizing the expected Kullback-Leibler divergence between the prior and the posterior in an asymptotic way. Ghosal (997) proposed a reference prior in multi-parameter non-regular cases such as a multi-parameter family of discontinuous densities where some regular type parameters are added to the one-parameter family of discontinuous densities used by Ghosal and Samanta (997). The reference prior was computed through two procedures when nuisance parameter exists. One procedure adapted the reference prior method proposed by Berger and Bernardo (99) and another was an extension of the reference prior method provided by Ghosal and Samanta (997). The differential equations which first order matching priors for one- and multi-parameter non-regular cases must satisfy were built by Ghosal (999). He considered the one- and multi-parameter family of discontinuous densities used by Ghosal and Samanta (997) and Ghosal (997)..3 Outline This dissertation is organized as follows. In Chapter a general independent reference prior is developed by extending the results of Sun and Berger (998). The invariance under a type of one-to-one transformation of the parameters is proven. The first order matching property is also obtained. The independent reference priors are derived in various examples of probability distributions in Chapter 3. We compare the independent reference priors with the reference priors.

23 Mi Hyun Lee Chapter. Introduction 4 We also see whether the independent reference priors satisfy the first order matching criterion or not. In Chapter 4 an independent reference prior in some types of non-regular cases is derived. We construct a sufficient condition under which the independent reference prior agrees to a first order matching prior. The independent reference priors are computed in some examples. We summarize and propose future work in Chapter 5.

24 Chapter Main Results for Independent Reference Priors. Notation Consider a parametric family of distributions whose density is given by f(x; θ) for the data X X where θ Θ IR p is a p-dimensional unknown parameter vector which can be decomposed into m sub-vectors θ = (θ... θ m ). (.) Here θ i = (θ i... θ ipi ) Θ i IR p i where Θ = Θ Θ m with p + + p m = p. We define the partitioned Fisher information matrix of θ Σ(θ) = E θ [ θ i θ j log f(x; θ) ] i j =... m (.) where E θ denotes expectation over X given θ. We will often write Σ instead of Σ(θ). 5

25 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 6. Independent Reference Priors In this section we provide an independent reference prior by generalizing the results of Sun and Berger (998). We consider more groups of parameters than the two groups considered by Sun and Berger (998). We propose an iterative algorithm to find the general independent reference prior and obtain a mild sufficient condition to deduce the result of it without going through the iterations. Thus a closed form of the independent reference prior is derived. The invariance of independent reference priors to a type of one-to-one reparameterization where the Jacobian matrix is diagonal is proven. A sufficient condition under which the independent reference prior agrees to a first order matching prior is proposed. Thus two desired figures of independent reference priors are achieved. Numerous examples are given in Chapter 3. We study an independent reference prior in some types of non-regular cases in Chapter 4. Now we present an iterative algorithm to derive an independent reference prior for θ = (θ... θ m ). It is an extension of Algorithm A proposed by Sun and Berger (998). To begin with we note that Σ c ii is the matrix obtained by removing the rows and columns corresponding to θ i from Σ and θi c = (θ... θ i θ i+... θ m ) i =... m. Algorithm B: Step 0. Choose any initial nonzero marginal prior densities for θ i namely π (0) i (θ i ) for all i =... m. Step. Define an interim prior density for θ by { π () (θ ) exp m k= ( ) } π (0) k (θ k) log dθ c Σ c.

26 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 7 Step i. For i = m define interim prior densities for θ i by π () i (θ i ) exp [ i k= ] π () k (θ k) m π (0) k=i+ k (θ k) log ( ) Σ c ii dθi c Replace π (0) i (θ i ) i =... m in Step 0 by π () i (θ i ) i =... m and repeat Step i to obtain π () i (θ i ) for i =... m. Consequently the sequences of the marginal priors {π (l) i (θ i ) : i =... m} l are generated. The marginal reference priors for θ i will be the limits if the limits exist. πi R (θ i ) = lim π (l) i (θ i ) i =... m l In practice the interim priors {π (l) i (θ i ) : i =... m} l could be improper. In such cases one might need to implement an algorithm using compact sets as it is recommended by Sun and Berger (998). Choose an increasing sequence {Θ j i } j of compact subsets of Θ i such that Θ j i = Θ i i =... m. j= We then could use the following algorithm. Algorithm B : Step 0. For fixed j choose any initial proper marginal prior densities for θ i on Θ j i namely π (0) ij (θ i ) for all i =... m. Step. Define an interim prior density for θ on Θ j by { π () j (θ ) exp m h= Θj h m k= ( ) } π (0) kj (θ k) log dθ c Σ c.

27 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 8 Step i. For i =... m define interim prior densities for θ i on Θ j i by ij (θ i ) exp π () h i Θ j h [ i k= ] π () kj (θ k) m π (0) k=i+ kj (θ k) log ( ) Σ c ii dθi c. Replace π (0) ij (θ i ) i =... m in Step 0 by π () ij (θ i ) i =... m and repeat Step i to obtain π () ij (θ i ) for i =... m. Consequently we have sequences of marginal priors {π (l) ij (θ i ) : i =... m} j l. Let θ 0 i be an interior point of Θ i i =... m. The marginal reference priors for θ i will be the limits if these limits exist. π R i (θ i ) = lim j lim π (l) l π (l) ij (θ i ) i =... m ij (θi 0 ) The convergence of the iterations is not guaranteed. Then we might not obtain a closed form of the independent reference prior. Here we have found a sufficient condition for deriving an independent reference prior without going through the iterations. Theorem. Suppose for all i =... m Σ c ii = f i(θ i )f i (θ c i ) (.3) where θ c i = (θ... θ i θ i+... θ m ) Σ is the Fisher information matrix of θ = (θ... θ m ) and Σ c ii is the matrix which is derived by removing the rows and columns corresponding to θ i from Σ. Then the independent reference prior for θ = (θ... θ m ) is where the marginal reference priors for θ i i =... m are m π R (θ) = πi R (θ i ) (.4) i= π R i (θ i ) f i (θ i ). (.5)

28 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 9 Proof. It can be easily seen that under Condition (.3) π R i (θ i ) i =... m do not depend on the choices of π (0) i (θ i ) i =... m in Step 0. Hence the marginal reference priors for θ i i =... m have the form of (.5) and the independent reference prior for θ is given by (.4). In the next corollary an independent reference prior is derived under orthogonality. Consequently it is shown to be same as the the independent reference prior in (.4). Corollary. Consider the following Fisher information matrix of θ = (θ... θ m ) Σ = diag ( f (θ )f (θ c )... f m (θ m )f m (θ c m) ). Then the independent reference prior for θ is the same as (.4). Proof. It is clear that for all i =... m / Σ c ii = f i (θ i )f i (θ c i ) which satisfies Condition (.3). Now we prove that the independent reference prior given by (.4) is invariant under a type of one-to-one transformation of the parameters where the Jacobian matrix is diagonal. Theorem. For any i = m let η i = g i (θ i ) be a one-to-one transformation of θ i. Then under Condition (.3) the independent reference prior for η = (η η m ) is formed as where the marginal reference priors for η i i =... m are m π R (η) = πi R (η i ) (.6) i= ( πi R (η i ) f i g i (η i ) ) g i (η i ) η i. (.7)

29 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 0 Proof. The Fisher information matrix of η is given by H = T ΣT where Σ is the Fisher information matrix of θ = (θ... θ m ) and T = diag ( ) g (η )... gm (η m ). η η m The matrix H c ii which is derived by removing the rows and columns corresponding to η i from H is of the form H c ii = T c ii Σ c iit c ii where Σ c ii is the matrix which is derived by removing the rows and columns corresponding to θ i from Σ and ( Tii c = diag g (η )... η Thus η i g i (η i ) η i+ g i+(η i+ )... ) gm (η m ). η m H = H c ii = m g k k= η (η k) k m g j (η j ) Σ c j=j i η j ii. From Condition (.3) it can be shown that where H H c ii = = mk= η k gk (η k) mj=j i η j gj (η j ) g i (η i ) η i Σ c ii ( f i g i (η i ) ) ( f i g i (η i ) c) gi (η i ) c = ( g (η )... gi (η i ) gi+(η i+ )... gm (η m ) ).

30 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors Thus we can write H H c ii = h i(η i )h i (η c i ) where ( h i (η i ) = f i g i (η i ) ) g i (η i ) η i h i (η c i ) = f i ( g i (η i ) c). Hence by Theorem. the independent reference prior for η is m π R (η) = πi R (η i ) i= where the marginal reference priors for η i i =... m are The result then follows. ( πi R (η i ) h i (η i ) = f i g i (η i ) ) g i (η i ) η i..3 First Order Matching Priors We propose a sufficient condition under which the independent reference prior given by (.4) is a first order matching prior. Thus we can easily prove that the independent reference prior is a first order matching prior without solving the differential equation given by (.6). Theorem.3 Let θ = (θ... θ m ) where θ i Θ i IR. For fixed i = m assume for all j =... m Σ c ij = f i (θ i )f i (θ c i ) if j = i fj (θ j )f i (θ c i ) f 3j (θ c j) if j i (.8)

31 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors where θl c = (θ... θ l θ l+... θ m ) l = i j Σ is the Fisher information matrix of θ Σ c ij is the matrix which is derived by removing the i th row and j th column from Σ and f j f i for j i is a positive function of its argument. Then the independent reference prior π R (θ) for θ given by (.4) is a first order matching prior for θ i. Proof. For fixed i let ψ = ψ(θ) = θ i. By (.8.3) of Datta and Mukerjee (004) a first order matching prior π M = π M (θ... θ m ) for ψ satisfies the following differential equation where (ξ π) + + (ξ m π) = 0 (.9) θ θ m ξ = (ξ... ξ m ) = Σ ψ ψ Σ ψ (.0) where ψ = ( θ ψ... θ m ψ ) and Σ is the Fisher information matrix of θ = (θ... θ m ). We need to show that the reference prior π R (θ) for θ given by (.4) satisfies the equation (.9). It is easy to see that ψ = ( ) where is the i-th element of ψ and ξ = (ξ... ξ m ) = { } ( ) i+ Σc i Σ c ii... Σc ii... Σc im ( )i+m. From Condition (.8) for j =... m ξ j = ( ) i+j f i (θ i )f i (θi c ) Σc ij if j = i fi (θ i )f i (θi = c ) ( ) i+j f i (θ i ) f j (θ j ) f 3j (θj c ) if j i. Thus the differential equation (.9) becomes π(θ) m + ( ) i+j f i (θ i ) π(θ) = 0. θ i f i (θ i )f i (θi c ) θ j fj (θ j ) f 3j (θj) c j=j i

32 Mi Hyun Lee Chapter. Main Results for Independent Reference Priors 3 Now it can be shown that π R (θ) m + ( ) i+j f i (θ i ) π R (θ) θ i f i (θ i )f i (θi c ) j=j i θ j fj (θ j ) f 3j (θj) c mk= f k (θ k ) m + ( ) i+j f i (θ i ) m k= f k (θ k ) θ i f i (θ i )f i (θi c ) j=j i θ j fj (θ j ) f 3j (θj) c = mk=k i f k (θ k ) m + ( ) i+j f i (θ i ) m k=k j f k (θ k ) θ i f i (θi c ) j=j i θ j f 3j (θj) c = 0. Hence the independent reference prior π R (θ) for θ given by (.4) is a solution for the differential equation (.9). The result then holds. Corollary. Suppose that in Condition (.8) Σ c ij = 0 for some j i. The independent reference prior π R (θ) for θ given by (.4) is a first order matching prior for θ i. Proof. Clearly if Σ c ij = 0 then ξ j = 0 for some j i. Thus θ j (ξ j π) = 0 for any π. The result then follows.

33 Chapter 3 Examples In this chapter various examples of commonly used probability distributions are studied. We derive the independent reference priors by employing Theorem. and compare them with the reference priors. We also verify if the independent reference priors are also first order matching priors by applying Theorem.3. Consequently the independent reference priors are shown to be the reference priors and first order matching priors in most of the examples. Note that most of the probability density functions the Fisher information matrices and the reference priors in this chapter were provided by Yang and Berger (997) unless other references are cited. 3. Binomial Model: Two Independent Samples For fixed n and n let X and X be independent binomial random variables with the parameters (n p ) and (n p ) respectively. Then the joint density of (X X ) is f(x x p p ) = ( ) n x p x ( p ) n x ( n x 4 ) p x ( p ) n x (3.)

34 Mi Hyun Lee Chapter 3. Examples 5 for x i {0... n i } i =. Here 0 < p p <. The Fisher information matrix of (p p ) is Σ(p p ) = n p ( p 0 ) 0 n p ( p ). (3.) Hence the marginal reference priors for p and p are π R (p ) π R (p ) p ( p ) p (0 ) (3.3) p ( p ) p (0 ) (3.4) and the independent reference prior for (p p ) is π R (p p ) p ( p )p ( p ). (3.5) It is a first order matching prior for p and p and the reference prior for (p p ) when one of the parameters p or p is of interest and the other is nuisance parameter. 3.. Two Binomial Proportions Sun and Berger (998) conducted objective Bayesian analysis by using the independent reference prior for the log-odds ratio of two binomial proportions in the example of a clinical trial: ECMO (extra corporeal membrane oxygenation). The ECMO example is described here: n patients are given standard therapy and n patients are treated with ECMO. The probability of success under standard therapy is p and the probability of success under ECMO is p. Let X be the number of survivors from standard therapy and X be the number of survivors from ECMO. Then X is a binomial random variable with parameters (n p ) and independently X is a binomial random variable with parameters (n p ). The main interest is to compare the two treatments. Then the log-odds ratio of p and p

35 Mi Hyun Lee Chapter 3. Examples 6 defined as δ = η η with η i = log[p i /( p i )] i = is used for comparing them when η is nuisance parameter. Under the assumption of the independence of δ and η Sun and Berger (998) obtained the marginal reference priors for δ ( IR) and η ( IR) which are given by ( π R (δ) exp π π R (η ) = 0 {t( t)} 0.5 log [ + n ] ) {( t)e δ/ + te δ/ } dt (3.6) n e η / π( + e η ). (3.7) Consequently the independent reference prior for (δ η ) is π R (δ η ) h(δ)eη / + e η (3.8) where ( h(δ) = exp [ {t( t)} 0.5 log + n ] ) {( t)e δ/ + te δ/ } dt. (3.9) π 0 n Now we compare the four priors for (δ η ) with respect to the frequentist matching property for δ and mean squared errors of the Bayes estimators of δ through simulation studies. The frequentist matching property is investigated by observing the frequentist coverage probabilities of the posterior credible interval for δ. The four priors considered here are constant prior Jeffreys-rule prior Cauchy prior and independent reference prior given by (3.8). First we compute the joint likelihood of (δ η ) which is given by ( ) ( L N n e η ) x ( ) n ( )( ) x n e δ+η x( (δ η ) = + e η + e η + e δ+η + e δ+η x since the likelihood of (p p ) is given by (3.) with p = eη +e η and p = eδ+η +e δ+η. x ) n x We also derive the priors for (δ η ). The prior for (δ η ) corresponding to the constant prior for (p p ) is π C (δ η ) e δ+η ( + e η ) ( + e δ+η ) (3.0)

36 Mi Hyun Lee Chapter 3. Examples 7 the Jeffreys-rule prior for (δ η ) is and the Cauchy prior for (δ η ) is [ π J e δ+η ] 0.5 (δ η ) ( + e η ) ( + e δ+η ) (3.) π A (δ η ) ( + δ )( + η ) (3.) by assuming the independence of δ and η. We then obtain the marginal posterior density functions for δ using the four priors. Let n S = x + x and n F = n + n n S. By using the transformations η = log ( t t ) and δ = log ( u ) the marginal posterior density function for δ using the constant prior (3.0) u is π C (δ x x ) = LN (δ η )π C (δ η )dη LN (δ η )π C (δ η )dη dδ = 0 (e δ ) x + 0 tn S+ ( t) n F + ( 0 ( u u )x + u( u) tn S+ ( t) n F + t+e δ t ( ) n + dt ) n + dtdu t+ u u t the marginal posterior density for δ using the Jeffreys-rule prior (3.) is π J (δ x x ) = 0 (e δ ) x tn S ( t) n ( F 0 ( u u )x +0.5 u( u) tn S ( t) n F ( t+e δ t ) n + dt ) n + dtdu t+ u u t the marginal posterior density function for δ using the Cauchy prior (3.) is π A (δ x x ) = 0 0 (e δ ) x +δ 0 tn S ( t) n F ( ( u u )x +{log ( u u ) } u( u) tn S ( t) n F ) n dt t+e δ t +{log ( t t ( )} t+ u u t ) n +{log ( t t ) } dtdu and the marginal posterior density for δ using the independent reference prior (3.8) is π R (δ x x ) = (e δ ) x h(δ) 0 tns 0.5 ( t) n F 0.5 ( 0 ( u u )x h(log{ u }) u u( u) 0 tn S 0.5 ( t) n F 0.5 t+e δ t ( ) n dt t+ u u t ) ndtdu where h( ) is given by (3.9).

37 Mi Hyun Lee Chapter 3. Examples 8 Mean Squared Errors Below are the analytical results for obtaining the mean squared error of the Bayes estimator of δ. Under the squared loss function L(θ a) = (θ a) the Bayes estimator of δ ˆδ l ˆδ l (x x ) is its posterior mean which is computed by δπl (δ x x )dδ l = C J A R. Thus by letting δ = log ( u u ) δ C = δ J = δ A = δ R = ( 0 log ( u )( u u u )x + u( u) tns+ ( t) n F ( u u )x + u( u) tn S+ ( t) n F + 0 log ( u u )( u u )x +0.5 u( u) tn S ( t) n F 0 0 ( u u )x log ( u 0 0 u ) u( u) tn S ( t) n F ( ( u u )x +{log ( u u )} ( t+ u u t ( ) n + dtdu ) n + dtdu t+ u u t ) n + dtdu ) n + dtdu t+ u u t t+ u u t ( u( u) tns ( t) n F ( u u )x +{log ( u u ) } u( u) tn S ( t) n F 0 log ( u u )( u u )x h(log{ u u }) u( u) where h( ) is given by (3.9). ( t+ u u t 0 tn S 0.5 ( t) n F ( u u )x h(log{ u }) u u( u) 0 tn S 0.5 ( t) n F 0.5 Hence the mean squared error is given by for l = C J A R. MSE l = E [ ( δ l δ) ] = n n x =0 x =0 ) n dtdu t+ u u t +{log ( t t ) )} n ( +{log ( t ( t+ u u t t+ u u t dtdu t ) } ) ndtdu ) ndtdu ( δ l δ) L N (δ η ) (3.3) Computing (3.3) is straightforward if n and n are small. However the following approximation is proposed for large n and n. For fixed (δ η) p = eη and +η +e η p = eδ +e δ +η are obtained. Then for fixed n and n two sets of independent binomial random variables x (k) p and x (k) p with (n p ) and (n p ) respectively are generated for k =... K. For the simulated x (k) and x (k) (k) δ l l = C J A R can be calculated by using the above

38 Mi Hyun Lee Chapter 3. Examples 9 equations. Then ( δ (k) l δ ) k =... K are computed for l = C J A R. Thus the estimate of MSE l is MSE l = K K k= ( δ (k) l δ ) for l = C J A R. The results are shown in Table 3. in the end of Section 3.. A prior which has small mean squared errors is desirable. We considered small (n = n = 0) and large sample sizes (n = n = 50). We then chose δ = 0 and η = 0 for each δ and ran K = 5000 replicates for each set of (δ η). It is observed that the mean squared errors obtained by using the Jeffreys-rule prior and the independent reference prior are larger than those using the constant prior and the Cauchy prior for both small and large samples. However the differences are much smaller for large samples. Thus the constant prior and the Cauchy prior might perform better than the Jeffreys-rule prior and the independent reference prior in the inference of δ with respect to the mean squared errors. Frequentist Coverage Probabilities We explain how to compute the frequentist coverage probability of the one-sided posterior credible interval for δ. For any α (0 ) let q l α(x x ) be the posterior α-quantile of δ i.e. P (δ q l α(x x ) x x ) = α for l = C J A R. Then the frequentist coverage probability of the one-sided (α 00)% posterior credible interval ( q l α(x x )) is defined as for l = C J A R P (δη )(δ q l α(x x )) = n n x =0 x =0 I { δ q l α(x x ) } L N (δ η ) where I{ } is the indicator function. It is desired that the frequentist coverage probability is close to α. It could be difficult to compute the frequentist coverage probability if q l α(x x )

39 Mi Hyun Lee Chapter 3. Examples 30 l = C J A R is found first. Alternatively we first consider for fixed (δ η ) { (x x ) : δ q l α(x x ) } = { (x x ) : δ π l (δ x x )dδ < α for l = C J A R. Then the frequentist coverage probability can be approximated as follows. For the simulated x (k) and x (k) k =... K which are generated as the previous section on mean squared errors the posterior density function π l (δ x (k) x (k) ) l = C J A R can be computed. Then the estimate of P (δη )(δ q l α(x x )) is given by } P (δη )(δ q l α(x x )) = K { K } δ I π l (δ x (k) x (k) )dδ < α k= for l = C J A R. It is shown that by letting δ = log ( u u ) δ = δ = δ = δ = π C (δ x x )dδ 0 eδ +e δ 0 ( 0 ( u u )x + u( u) tns+ ( t) n F + 0 ( u u )x + u( u) tn S+ ( t) n F + π J (δ x x )dδ 0 eδ +e δ 0 ( 0 ( u u )x +0.5 u( u) tn S ( t) n F 0 ( u u )x +0.5 π A (δ x x )dδ 0 eδ +e δ π R (δ x x )dδ ( u u )x +{log ( u u ) } u( u) tn S ( t) n F ( ( t+ u u t t+ u u t t+ u u t t+ u u t ( u( u) tns ( t) n F ( u u )x +{log ( u u )} u( u) tn S ( t) n F eδ +e δ 0 ( u u )x h(log{ u }) u u( u) ( ) n + dtdu ) n + dtdu ) n + dtdu ) n + dtdu ) n t+ u u t +{log ( t t+ u u t ( 0 tns 0.5 ( t) n F ( u u )x h(log{ u }) u u( u) 0 tn S 0.5 ( t) n F 0.5 dtdu t ) } ) n +{log ( t t )} dtdu ( t+ u u t t+ u u t ) ndtdu ) ndtdu

40 Mi Hyun Lee Chapter 3. Examples 3 where h( ) is given by (3.9). The output is given in Table in the end of Section 3.. Table displays the frequentist coverage probabilities of the one-sided (α 00)% posterior credible interval for δ when α = respectively. Recall that we want a prior whose frequentist coverage probabilities are close to α. We considered small (n = n = 0) and large sample sizes (n = n = 50). We then chose δ = 0 and η = 0 for each δ and ran K = 5000 replicates for each set of (δ η). From Table it is roughly seen that the frequentist coverage probabilities computed by using the Jeffreys-rule prior and the independent reference prior are much closer to α than those using the constant prior and the Cauchy prior for both small and large samples. It is also observed that the frequentist coverage probabilities derived by using the constant prior are closer to α than those using the Cauchy prior. It is clear that the frequentist coverage probabilities are consistently much closer to α for large samples than small samples. Hence the Jeffreys-rule prior and the independent reference prior could be better priors for the inference of δ than the constant prior which is better than the Cauchy prior with respect to the frequentist matching property. This conclusion is the opposite of the one obtained when considering the mean squared errors.

41 Mi Hyun Lee Chapter 3. Examples 3 Table 3.: Mean Squared Errors n = n = 0 n = n = 50 δ η π C π J π A π R π C π J π A π R

42 Mi Hyun Lee Chapter 3. Examples 33 Table 3.: Frequentist Coverage Probabilities of One-sided 5% Posterior Credible Interval for δ n = n = 0 n = n = 50 δ η π C π J π A π R π C π J π A π R

43 Mi Hyun Lee Chapter 3. Examples 34 Table 3.3: Frequentist Coverage Probabilities of One-sided 50% Posterior Credible Interval for δ n = n = 0 n = n = 50 δ η π C π J π A π R π C π J π A π R

Δείτε περισσότερα