Objective Priors for the Bivariate Normal Model with Multivariate Generalizations James O. Berger Duke University, Durham, NC 7708, USA e-mail: berger@stat.duke.edu and Dongchu Sun University of Missouri-Columbia, Columbia, MO 65, USA e-mail: dsun@stat.missouri.edu March 3, 006 Summary Study of the bivariate normal distribution raises the full range of issues involving objective Bayesian inference, including the different types of objective priors e.g., Jeffreys, invariant, reference, matching, the different modes of inference e.g., Bayesian, frequentist, fiducial, and the criteria involved in deciding on optimal objective priors e.g., ease of computation, frequentist performance, marginalization paradoxes. Summary recommendations as to optimal objective priors are made for a variety of inferences involving the bivariate normal distribution. In the course of the investigation, a variety of surprising results were found, including the availability of objective priors that yield exact frequentist inferences for many functions of the bivariate normal parameters, including the correlation coefficient. Several generalizations to the multivariate normal distribution are given. Some key words: Reference priors, matching priors, Jeffreys priors, right-haar prior, fiducial inference, frequentist coverage, marginalization paradox, rejection sampling, constructive posterior distributions. This research was supported by the National Science Foundation, under grants DMS-00365 and SES- 03553, and the National Institute of Health, under grants R0-CA00760 and R0-MH0748. We are grateful to Fei Liu for performing the numerical frequentist coverage computations, to Xiaoyan Lin for computing the matching priors in Table 6, and to Susie Bayarri for helpful discussions.
Contents Introduction and Prior Distributions. Notation and Problem Statement......................... Parameter Marginal Fiducial/Posterior Distributions............. 3.3 Other Quantities For Which Exact Matching is Possible............ 6.4 General Parameters and Priors......................... 7.4. Recommended priors........................... 7.4. Reference Priors.............................. 9.4.3 Other studied priors........................... Computation. Marginal Posteriors of σ, σ, ρ under the Priors π Rρ, π Rσ, π Rσ, π S, and π MS. Computation under π ab...............................3 Computation under π Rλ............................. 6 3 Comparisons of Priors via Frequentist Matching 6 3. Frequentist Coverage Probabilities and Exact Matching............ 6 3.. Preliminaries............................... 7 3.. Credible intervals for µ µ under π ab................. 8 3..3 Credible intervals for a class of functions of σ, σ, ρ, including σ, σ, ρ, ρσ /σ, σ ρ, Σ, ρσ σ, σ /σ, and ρ/σ ρ...... 9 3..4 Coverage Probabilities for µ /σ, µ /σ, and σ + σ ρσ σ under π ab 4 3. First Order Asymptotic Matching........................ 5 3.. The Fisher Information.......................... 6 3.. Differential Equations of the First Order Matching Priors....... 6 3.3 Numerically Computed Coverage and Summary Recommendations...... 30 4 Multivariate Normal 3 4. Other Prior Generalizations........................... 3 4. Posterior Computation.............................. 33 Appendix A: Derivation of Reference Priors A. When µ Is of Interest A. When ρ Is of Interest A.3 When σ Is of Interest A.4 When Other Parameters Are of Interest A.5 When the Eigenvalues of Σ Are of Interest Appendix B. Proofs
Introduction and Prior Distributions. Notation and Problem Statement The bivariate normal distribution of x, x has mean parameters µ, µ and covariance σ matrix Σ = ρσ σ ρσ σ σ, where ρ is the correlation between x and x. The density of x, x is = fx, x µ, µ, σ, σ, ρ { πσ σ exp σ x µ + σ x µ } ρσ σ x µ x µ. ρ σσ ρ The data consists of an independent random sample X = x k = x k, x k, k =,, n of size n 3, for which the sufficient statistics are n x = x, x and S = x k xx i x s = r s s r s k= s s where, for i, j =,, n x i = n n x ij, s ij = x ik x i x jk x j, r = s. j= k= s s, We will denote prior densities as πµ, µ, σ σ, ρ, and the corresponding posterior densities as πµ, µ, σ σ, ρ X all with respect to dµ dµ dσ dσ dρ. We consider objective inference for parameters of the bivariate normal distribution and functions of these parameters, with special focus on development of objective confidence or credible sets. The many results in the paper can be grouped into the following categories.. Addressing historical issues and conjectures. Obtaining optimal objective Bayesian analyses 3. Obtaining frequentist confidence sets often exact but, at least, asymptotically optimal 4. Providing simple computational implementations The historical issues relate to fiducial inference for the correlation parameter ρ, and conjectures as to whether the fiducial distribution could be derived in a Bayesian fashion. This is addressed in Section., where it is also observed that the fiducial distribution yields an exact frequentist confidence set for ρ, which seems to have been unrecognized. The method introduced to show that exact frequentist confidence sets for ρ are available is also utilized to find exact frequentist confidence sets for a variety of other parameters of
the bivariate normal distribution. These results are summarized in Section.3. It should be stressed that these are exact even for the minimum sample size of n = 3 in contrast to many commonly used confidence procedures that are simply asymptotically correct. A large number of potential objective prior distributions are considered in the paper. For ease of use, our recommendations as to the priors to use depending on the desired inference are summarized in Section.4. Often, the posteriors for the recommended priors are essentially available in computational closed form, allowing direct Monte Carlo simulation. These computational-ready representations of the posteriors which we call constructive posterior distributions are also given in these introductory sections. Section provides simple accept-reject schemes for computing with the recommended priors when the posteriors are not amenable to direct simulation. Section 3 and the appendices develop the needed theory derivation of reference priors, derivation of constructive posterior distributions, proofs of exact frequentist coverage, and proofs of asymptotic frequentist coverage and also present various simulations that were conducted to enable summary recommendations to be made. The recommended objective priors for the bivariate normal distribution have extensions to the multivariate normal distribution. These extensions, and algorithms for computing with the resulting posteriors, are given in Section 4. Technical Issue: We will assume that ρ < and r < in virtually all expressions and results that follow. This is because, if either equals in absolute value, then ρ = {sign of r} with probability either frequentist or Bayesian posterior, as relevant. Indeed, the situation then essentially collapses to the univariate version of the problem in which the randomness is say in fx µ, σ, with µ and σ or x and s being determined as µ = x + µ x s /s and σ = σ s /s. Since the one-dimensional normal problem is standard, we will not explicitly present results for this degenerate case.. Parameter Marginal Fiducial/Posterior Distributions The bivariate normal distribution has been extensively studied from frequentist, fiducial and objective Bayesian perspectives. Table summarizes the fiducial distributions for the five parameters of the bivariate normal distribution; these were found in Fisher 930, 956 and Pratt 963. The table also lists objective prior distributions whose marginal posterior distribution for the indicated parameter equals the fiducial distribution. The results for the objective priors are easy consequences of expressions in Geisser and Cornfield 963, except for that for ρ, which is discussed further below. Fiducial/posterior distributions are represented in the table as constructive random distributions, i.e., by a description of how to simulate from them. Thus to simulate from the fiducial distribution of σ, given the data actually, only s is needed, one draws independent n random variables and simply computes the corresponding s / n ; this yields an independent sample from the fiducial/posterior distribution of σ. The in the notation in the table is to distinguish the constructive random distribution from later expressions 3
Table : Fiducial/posterior distributions and certain objective priors the right-haar prior, π H, and the Jeffreys-rule prior, π J that yield these as posteriors. Here Z is a standard normal random variable, and n and n are chi-squared random variables with the indicated degrees of freedom, all random variables being independent. parameter objective prior fiducial and posterior distribution ρ π H = ψ Z n σ r ρ n r µ π J = σ σ ρ and π H x + Z µ π J and π H x + Z σ π J and π H s n σ π J s n + n s n n s n n, ψx = x +x involving randomness in the data. We find such constructive random distributions to be much more useful in today s computing environment than are, say, analytic formulas for the corresponding densities. Also, these representations will be seen to be crucial in proving exact frequentist matching, as discussed below. Table also summarizes frequentist inference concerning these parameters, in the following sense: suppose the indicated fiducial/posterior distribution is used to create one-sided credible intervals θ L, θ α X, where θ L is the lower limit in the relevant parameter space,, or 0 for the five parameters above, and θ α X is the posterior quantile of the parameter θ of interest, defined by P θ < θ α X X = α. 3 Here θ is the random variable. Of interest is the frequentist coverage of the corresponding confidence interval, i.e., Cµ, µ, σ, σ, ρ = P θ < θ α X µ, µ, σ, σ, ρ. 4 Here X is the random variable. The closer Cµ, µ, σ, σ, ρ is to the nominal α, the better the procedure and corresponding objective prior is judged to be. The interesting feature of the fiducial/posterior distributions in Table is that they all yield credible sets that have exact frequentist coverage of α. When this is the case, the fiducial/objective Bayesian posterior will be called exact frequentist matching. This is a very desirable situation see Bayarri & Berger 004 for general discussion and the many earlier references, in that one has an exact frequentist procedure that is also known to have good conditional performance since it also is a Bayesian credible interval. As an example, to obtain an accurate approximation to the exact 00 α% equal-tailed frequentist confidence set for ρ, one can employ the following simple algorithm: Draw independent Z N0,, n and n. 4
Set ρ = Y +Y, where Y = Z n Repeat the process 0, 000 times. + n n r r ; Use the α/ upper and lower quantiles of these generated ρ to form the desired confidence limits. We highlight the results about ρ in Table because they appear to be new, in two senses:. We have not been able to find a reference in the literature to the fact that the fiducial/objective posterior distribution of ρ is exact frequentist matching proved here in Theorem 5. Indeed, standard statistical software utilizes various approximations to arrive at frequentist confidence sets for ρ, missing the fact that a simple exact confidence set exists. In contrast, it has long been known that the other fiducial/posterior distributions in Table are exact matching. It was, of course, known that exact frequentist confidence procedures can be constructed cf. exercise 54, chapter 6 of Lehmann 986, but explicit expressions do not seem to be available.. Geisser and Cornfield 963 studied the question of whether the fiducial distribution of ρ could be reproduced as an objective Bayesian posterior, and they concluded that this was most likely not possible. The strongest evidence for this arose from Brillinger 96, which used results from Lindley 96 and a difficult analytic argument to show that there does not exist a prior πρ such that the fiducial density of ρ equals fr ρπρ, where fr ρ is the density of r given ρ. Since the fiducial distribution of ρ only depends on r, it was certainly reasonable to speculate that, if it were not possible to derive this distribution from the density of r and a prior, then it would not be possible to do so in general. The above result, of course, shows that this speculation was incorrect. What Brillinger s result does show is that the fiducial/posterior distribution for ρ provides another example of the marginalization paradox Dawid et al. 973. Any proper prior has the property that, if the marginal posterior distribution for a parameter θ depends only on a statistic T whose distribution in turn depends only on θ then that posterior can be derived from the distribution of T together with the marginal prior for θ. If this is a basic property of any proper Bayesian procedure, then it would seem paradoxical for the property to be violated, as happens for the fiducial/posterior distribution for ρ. This presumption is not without its own controversies, however. For instance, in the bivariate normal problem, there is probably no proper prior distribution that yields a marginal posterior distribution of ρ which depends only on r, so the relevance of an unattainable property of proper priors could be questioned. In any case, this situation provides an interesting philosophical conundrum of a type that we have not previously seen: a complete fiducial/objective Bayesian/frequentist unification can be obtained for inference about the usual parameters of the bivariate normal distribution, but only if violation of the marginalization paradox is accepted. We will shortly introduce a prior distribution that avoids the marginalization paradox for ρ, but which is not exactly 5
frequentist matching. We, alas, know of no way to adjudicate between the competing goals of exact frequentist matching and avoidance of the marginalization paradox, and so will simply present both as possible objective Bayesian approaches..3 Other Quantities For Which Exact Matching is Possible Often, it is functions of the parameters of the bivariate normal distribution that are of interest, and it is natural to ask which other important quantities have objective Bayesian posteriors that are exact frequentist matching. First it is clear that any monotonic function of one of the parameters also has an exact matching posterior obtained by simple transformation of the posterior in Table. Thus the exact matching posterior for σ written constructively is simply s / n. Interestingly, the approach that was developed to establish that the fiducial/posterior distribution of ρ is exact frequentist matching also turned out to be applicable to a number of other functions of parameters. Table gives a summary of the exact matching posteriors, of which we are aware, for a variety of functions of parameters of the bivariate normal distribution. Proofs are given in Section 3.. Some of these results are already known, of course. Table also lists the objective prior distributions that yield the indicated objective posterior. For completeness, we repeat the entries of Table, since the new table indicates all the priors that result in the indicated posterior. The notation π ab in the table stands for the important class of prior densities a subclass of the generalized Wishart distributions of Brown et al. 994 π ab µ, µ, σ, σ, ρ = σ 3 a σ b. 5 ρ b/ Special cases of this class are π J = π 0, π H = π, the independence Jeffreys prior π IJ = π = σ σ ρ 3/, and π RO which has a = b =. The independence Jeffreys prior follows from using a constant prior for the means, and then the Jeffreys prior for the covariance matrix with means given. We will see that π IJ is virtually never optimal while π J is very often optimal in contradiction to the common perception that the independence Jeffreys prior is better than the Jeffreys prior. π RO is useful for certain parameters. There is one other interesting phenomenon here, namely that the posterior in the table for ξ = µ /σ is known to be subject to a marginalization paradox in the univariate normal case; this was established in Stone & Dawid 97. Indeed, their same proof shows that the posterior for the bivariate case is also subject to a marginalization paradox namely, it depends only on the statistic t = x / s, whose distribution in turn depends only on ξ, yet there is no prior distribution for ξ which will yield the posterior in the table from just the likelihood of t. Bernardo 979 showed, in the univariate normal case, that the reference prior is not subject to a marginalization paradox, and the same would almost certainly be true for a one-at-a-time reference prior for ξ were it to be found. Thus this appears to be another situation in which one must choose between exact matching and avoidance of a marginalization paradox. 6
Table : Parameters with exact matching priors and associated constructive posteriors: Here Z is a standard normal random variable, and n and n are chi-squared random variables with the indicated degrees of freedom, all random variables being independent. parameter prior posterior µ π b, b including π J and π H x + Z s n n µ π J = π 0 x + Z s n n n d µ, µ, d IR π J = π 0 and π H see Table 4 d x, x + Z σ π b, b including π J and π H d Σd π J = π 0 and π H see Table 4 µ σ π b, b including π J and π H ρ π H = π ψ ρ σ ρ π a, a including π H ρσ σ π a, a including π H σ ρ π a, a including π H s n d Sd n n Z n + x s Z + n ψy = y/ + y Z s n s n n r r d Sd n r s s Z r s s n s r n Σ σ σ ρ π H = π and π IJ = π S n n r r.4 General Parameters and Priors It is actually rare to have exact matching priors for parameters of interest. Also, one is often interested in very complex functions of parameters e.g., predictive distributions and/or joint distributions of parameters. For such problems it is important to have a general objective prior that seems to perform reasonably well for all quantities of interest. Furthermore, it is unappealing to many Bayesians to change the prior according to which parameter is declared to be of interest, and an objective prior that performs well overall is often sought..4. Recommended priors In addition to π J and π H, we will also recommend the following priors for various purposes:. π Rρ. π Rσ σ σ ρ. +ρ σ σ ρ. 3. π Rλ σ σ ρ σ σ σ σ +4ρ. 7
The first prior was developed in Lindley 965, using certain notions of transformation to constant information, and was studied extensively in Bayarri 98, where it was shown to be a reference prior see the next section. This will be our recommendation for use as a general purpose objective prior. Also, it is a prior that avoids the marginalization paradox for ρ and so is the recommended prior for ρ if one is more concerned with avoiding the marginalization paradox than having exact frequentist matching. The prior π Rσ will be suggested for use with inferences concerning σ, and π Rλ for use with inferences concerning eigenvalues. With these definitions, we can make our summary recommendations. Table 3 gives the four objective priors that are recommended for use, and indicates for which parameters or functions thereof they are recommended. These recommendations are based on three criteria:. The degree of frequentist matching. Having exact matching is viewed as best according to this criterion. A minimal requirement is first order asymptotic matching, discussed in Section 3.. Among priors with this latter property, further discrimination is obtained through numerical study of the degree of matching in the most challenging case, namely when n = 3; this is discussed in Section 3.3.. Being a one-at-a-time reference prior and hence avoiding the marginalization paradox is desirable, because of its logical justification. 3. Ease of computation. Table 3: Recommendations of objective priors for various parameters in the bivariate normal model: indicates that, for this parameter, the posterior will not be exact frequentist matching. For µ and parameters with σ replaced by σ, use the right-haar prior with the variances interchanged. Prior π Rρ ρ, π H µ, σ, π H see Table 4 π Rλ σ σ, general use µ σ, Σ, d µ, µ, d Σd ch max Σ π Rσ σ = ρσ σ parameter ρσ σ, σ ρ, η 3 = ρ, ρ σ ρ The inferences involving the non-boxed parameters in Table 3 are given in essentially closed form in Table and so are computationally simple, and are exact frequentist matching. Furthermore, with the exception of µ /σ and η 3, it will be seen that the non-boxed parameters have the indicated priors as one-at-a-time reference priors, so all three criteria 8
point to the indicated recommendation. For µ /σ and η 3 we provisionally go with the frequentist matching prior, but the marginalization paradox mentioned earlier suggests that this is not as clear a conclusion. Computation for the boxed parameters in Table 3 are still quite easy for the indicated priors, as will be seen in Section. As these are not exact matching, first order matching and numerically investigated matching are studied in Sections 3. and 3.3, respectively. The rationales for the recommended priors for the boxed parameters are also given there..4. Reference Priors This paper began with an effort to derive and catalogue the possible reference priors for the bivariate normal distribution. The reference prior theory cf. Bernardo, 979, and Berger and Bernardo, 99a, 99b has arguably been the most successful technique for deriving objective priors. Reference priors depend on i specification of a parameter of interest; ii specification of nuisance parameters; iii specification of a grouping of parameters; and iv ordering of the groupings. These are all conveyed by the shorthand notation used in Table 4. The following are examples: {µ, µ, σ, σ, ρ} indicates that µ, µ is the parameter of interest, with the remaining parameters being nuisance parameters; and there are two groupings with the indicated ordering. The corresponding reference prior is also the independence Jeffreys prior, π IJ, mentioned earlier. {ρ, σ, σ, µ, µ } indicates that ρ is the parameter of interest, with the rest being nuisance parameters; each parameter is it s own group; and the parameters have the indicated order of importance. {λ, λ, ϑ, µ, µ } introduces the eigenvalues λ > λ of Σ as being primarily of interest, with ϑ the angle defining the orthogonal matrix that diagonalizes Σ, µ, and µ being the nuisance parameters Based on experience with numerous examples, the reference priors that are typically judged to be best are one-at-a-time reference priors, in which each parameter is listed separately as its own group. Hence we will focus on these priors. It turns out to be the case that, for the one-at-a-time reference priors, the ordering of µ and µ among the variables is irrelevant i.e., any placement of those two variables in the ordering will result in the same reference prior. Hence if µ and µ are omitted from a listing in Table 4, the resulting reference prior is to be viewed as any one-at-a-time reference prior with the indicated ordering of other variables, with the µ i being inserted anywhere in the ordering. We are interested in finding reference priors preferably one-at-a-time for the parameters µ, µ, σ, σ, ρ, θ = ρσ σ, θ = σ ρ, θ 3 = Σ = σ σ ρ, σ = ρσ σ, θ 4 = σ, σ η 3 = ρ, θ σ ρ 5 = µ σ, θ 6 = µ σ, θ 7 = d Σd d = d, d, not proportional to 0,, d µ, µ, and λ = ch max Σ. Table 4 gives one-at-a-time reference priors for all these 9
parameters i.e., the parameter appears as the first entry in the parameter ordering recall that the µ i could go anywhere except η 3, σ, and µ i /σ i ; finding one-at-a-time reference priors for these parameters requires a much more technical analysis than we utilize here. We do not explicitly list the reference priors for σ in the table, since they can be found by simply switching with σ in the various expressions. Table 4: Reference priors for the bivariate normal model where θ 8 = ρ σ /σ, θ 9 = σ σ, µ = d µ, µ, σ = θ 7, ρ = d Σ0, /σ θ7, θ = σ [ ρ ], and θ = ρσ / σ ; {{ }} indicates that any ordering of the parameters would yield the same reference prior. prior πµ, µ, σ, σ, ρ for parameter ordering has form 5 with π J {µ σ σ ρ, µ, σ, σ, ρ} a, b =, 0 π IJ σ σ {µ ρ 3/, µ, σ, σ, ρ} a, b =, π Rρ σ σ {ρ, σ ρ, σ } {θ 4, θ 9, ρ} π Rσ π Rσ +ρ σ σ ρ {σ, σ, ρ} σ σ ρ ρ {σ, ρ, σ } {σ, η 3, θ } π RO σ σ ρ 3/ {σ, θ, η 3 } a, b =, π Rλ [ σ σ σ σ +4ρ ] / σ σ ρ {λ, λ, ϑ} π H σ ρ {{σ, θ, θ }} a, b =, {{ Σ, θ 8, θ }} π H σ [ ρ ] d µ dµ d σ dσ d ρ {{d µ, µ, µ, θ 7, θ, θ }} For most of the parameters, the proofs that these are the indicated reference priors are given in Appendix A. We mention, here, the proof for d µ, µ and θ 7 = d Σd, because the formulation will be needed later. Indeed, consider the transformation Y = d d X N 0 d µ, µ, d Σd d Σ0, d Σ0, σ. 6 For this new bivariate normal problem, we can simply apply the previous result, showing that the right-haar prior is one-at-a-time matching for the new first coordinate mean µ = d µ, µ and first coordinate variance σ = θ 7. Note that the transformed µ and σ are the same as the originals. Note that the non-jeffreys reference priors are denoted by π Rθ, with θ referring to the parameter of interest. Discussion of the various choices, further notational details, and derivations are given in the Appendix. Note that π Rσ and π Rσ are almost the same; they 0
differ only by the bounded functions g ρ = + ρ and g ρ = / ρ, which are shown to be very similar in Figure..4.3 Other studied priors Finally, we should mention two other priors that have been suggested, the most common of which is the scale prior π S σ σ. 7 The motivation that is often given for this prior is that it is standard to use σi as the prior for a standard deviation, while < ρ < is on a bounded set and so one can use a constant prior in ρ. A related prior is π MS σ σ ρ, whose power of ρ is between that of π S and π Rρ. Computation In this paper, a constant prior is always used for µ, µ, so that [µ, µ Σ, X] N x, x, n Σ. 8 Generation from this conditional posterior distribution is standard, so the challenge of simulation from the posterior distribution requires only sampling from the marginal posterior of σ, σ, ρ given X. The marginal likelihood of σ, σ, ρ satisfies L σ, σ, ρ {σ σ ρ } n / etr Σ S. 9 It is immediate that, under the priors π J and π IJ, the marginal posteriors of Σ are Inverse Wishart S, n and Inverse Wishart S, n, respectively. The following sections deal with the more challenging priors.. Marginal Posteriors of σ, σ, ρ under the Priors π Rρ, π Rσ, π Rσ, π S, and π MS For these priors, an independent sample from their marginal posterior πσ, σ, ρ X can be obtained by the following acceptance-rejection algorithm cf. Robert & Casella 004 for discussion of accept-reject algorithms. Simulation Step: Generate σ, σ, ρ from the independence Jeffreys posterior π IJ σ, σ, ρ X the Inverse Wishart S, n distribution and, independently, sample u Uniform0,.
Rejection Step: Suppose M sup σ,σ,ρ πσ, σ, ρ π IJ σ, σ, ρ <. If u πσ, σ, ρ/[m π IJ σ, σ, ρ], report σ, σ, ρ; otherwise, go back to the Simulation Step. If M =, one would instead need to use the generally less efficient Metropolis-Hasting algorithm. Luckily, for the primary priors of interest, M is finite and, indeed, close to, which leads to very efficient algorithms. For each of the priors listed in Table 5, the key ratio, π/π IJ, is listed in the table, along with the upperbound M, the Rejection Step, and the resulting acceptance probability for ρ = 0.80, 0.95, 0.99. The rejection algorithm is quite efficient for sampling these posteriors. Indeed, for ρ 0, the algorithms accept with probability near one and, even for large ρ, the acceptance probabilities are very reasonable for the priors π Rρ, π Rσ, and π Rσ. For large ρ, the algorithm is less efficient for the posteriors under the priors π S and π MS, but even these acceptance rates may well be fine in practice, given the simplicity of the algorithm. Table 5: Ratio π/π IJ, upper bound M, rejection step and acceptance probability for ρ = 0.80, 0.95, 0.99, when π = π Rρ, π Rσ, π Rσ, π S, and π MS. Prior π Ratio π π IJ Bound M Rejection Step Acceptance Probability ρ =.80 ρ =.95 ρ =.99 π Rρ ρ u ρ.6000.3.40 π Rσ ρ 4 u ρ 4.7684.4307.985 ρ ρ π Rσ u.776.45.975 ρ ρ π S ρ 3/ u ρ 3/.60.0304.008 π MS ρ u ρ.3600.0975.099. Computation under π ab The most interesting prior of this form besides the Jeffreys and independence Jeffreys priors is the right-haar prior π H, although other priors such as π arise as reference priors and hence are potentially of interest. While Tables and gave an explicit form for the most important marginal posteriors arising from priors of this form, it is of considerable interest that essentially closed form generation from the full posterior of any prior of this form is possible see, e.g., Brown et al. 994. This is briefly reviewed in this section, since the
expressions for the resulting constructive posteriors are needed for later results on frequentist coverage. It is most convenient to work with the transformed variables η = /σ, η = /σ ρ, η 3 = ρ/σ ρ. This parameterization gives a type of Cholesky decomposition of the precision matrix Σ, Σ = which accounts for the simplicity of ensuing computations. 0 η η 3 η 0, 0 η η 3 η Fact a The inverse transformations are σ = η + η3 η 3, σ =, ρ =. η η η η + η3 b The Hessian matrix is H = µ, µ, σ, σ, ρ µ, µ, η, η, η 3 = c The Jacobians between the two parameterizations are J J 0 0 0 0 0 0 0 0 0 0 0 0 η η3 0 0 η +η η. 3 η 3 η η +η3 η η η η η +η3 0 0 η η 3 0 η η +η η 3 +η 3 3/ µ, µ, σ, σ, ρ µ, µ, η, η, η 3 µ, µ, η, η, η 3 µ, µ, σ, σ, ρ = = η η η + η 3, 3 σ 3 σ ρ. 4 d The prior π ab of 5 for µ, µ, σ, σ, ρ transforms to the extended conjugate class of priors for µ, µ, η, η, η 3, π ab µ, µ, η, η, η 3 =. 5 ηη a b The following results are immediate. 3
Lemma a Under the prior π ab, the conditional posterior of µ, µ given η, η, η 3 remains the same as 8 and the marginal posterior density of η, η, η 3 w.r.t. dη dη dη 3 is [η, η, η 3 X] η n a η exp { [η s + η s r + s η 3 +η r s b The marginal posterior distribution of η 3 given η, η ; X is N η r s s, s. c The marginal posterior distributions of η and η are independent and η X Gamma n a, s ; η X Gamma n b, s r. s ]}. 6 We next present the constructive posteriors of η, η, η 3, and from these derive the constructive posteriors of µ, µ, σ, σ, ρ; later we consider other functions of these parameters. Recall that these constructive posteriors are not only useful for simulation, but will be the key to proving exact frequentist matching results. We will use a star to represent a random draw from the implied distribution; thus µ will represent a random draw from its posterior distribution, Z, Z, Z 3 will be independent draws from the standard normal distribution, and n a and will be independent draws from chi-squared distributions with the indicated degrees of freedom. Fact a The constructive posterior of η, η, η 3 given X can be expressed as η = n a, 7 s η = s r, 8 η3 = Z 3 η r s = Z 3 r s s. 9 s s r b The constructive posterior of σ, σ, ρ given X can be expressed as σ = η σ = η + η3 η η = = s, 0 n a s r + Z 3 n a r r, where ρ = η 3 η + η3 ψx = = ψy, x, Y = Z 3 + x n a + r. n a r 4
Proof. Part a follows immediately from Lemma. For Part b, 0 follows from 7. Clearly η + η 3 = ρ = [ s σ n a + Z3 r ], r so that follows from 7 and 8. For Part c, the constructive posterior of ρ can be written as ρ = η 3 η + η 3 = { which, after some algebra, results in. n a s + Z 3 s [ Z 3 s s s Note that the constructive posterior of ρ depends only on r. r r ] } r r, Fact 3 The constructive posterior for µ and µ can be written µ = x + Z s n a µ = x + ρ σ µ σ x + Z = x + Z r s Z + n a n = x + r s t n a nn a + [ ] / n = x + t s n a, 3 nn a σ t ρ. 4 n Z 3 Z s r 5 n n a [ nn b + t n a n a s r ] /, 6 where t n a and t are standard t random variables with the indicated degrees of freedom. Proof. Since µ Σ, X Nx, σ /n, we have µ = x + Z σ / n. Substituting 0 into this expression yields 3. Because µ µ, Σ, X N x + ρσ µ x, σ n σ ρ, 4 holds. Substituting 3 into 4 yields 5. Expression 6 follows by noting that Z Z 3 t = Z Z 3 t = Z + t for any fixed t, where Z is also standard normal. The result is immediate. 5
.3 Computation under π Rλ Here is a Metropolis-Hastings algorithm from Berger et al. 005 to generate a dependent sample of size m from the marginal posterior σ, σ, ρ X based on the prior π Rλ, namely, σ k, σ k, ρ k X, k =,, m. At Stage k, sample Then set where Σ = σ ρ σ σ ρ σ σ σ σ k, σ k, ρ k X = Inverse WishartS, n. { σ, σ, ρ, with probability α k, σ,k, σ,k, ρ k, otherwise, σ,k σ,k σ,k σ,k + 4ρ k ρ α k = min,. σ σ σ σ + 4 ρ ρ k 3 Comparisons of Priors via Frequentist Matching 3. Frequentist Coverage Probabilities and Exact Matching In this subsection we compare the frequentist properties of posterior credible intervals for various quantities under the prior π ab, given in 5 and 5. As is customary in such comparisons, we study one-sided intervals θ L, q α x of a parameter θ, where θ L is the lower bound on the parameter θ e.g., 0 or and q α x is the posterior quantile of θ, defined by P θ < q α x x = α. Of interest is the frequentist coverage of the corresponding confidence interval, i.e., P θ < q α X µ, µ, σ, σ, ρ. The closer this coverage is to the nominal α, the better the procedure and corresponding objective prior is judged to be. Here is a simple illustration. Theorem a The posterior α quantiles of µ and σ are, respectively, [ ] / µ α = x + t s n a α, σ s α = = s, nn a n a α n a α where t n a α and n α are the α-quantiles of the indicated t and distributions. b The frequentist coverage probabilities of the one-sided credible intervals, µ α and 0, σ α for µ and σ, respectively, are P t n > n / t n a n a α 6, P n > n a α,
which do not depend on the parameters and equal α i.e. are frequentist matching if and only if a =. Proof. Part a follows from 0 and 3. For part b in regards to σ, the frequentist coverage of this interval now with s considered random with the other quantities fixed is clearly Cµ, µ, σ, σ, ρ = P σ < s µ, µ, σ, σ, ρ n a α = P s σ > n a α µ, µ, σ, σ, ρ,7 which equals α only when a =, since the frequentist distribution of s /σ is with n degrees of freedom. The proof of exact matching for the credible intervals for µ is virtually the same. Corollary If one considers the transformed bivariate normal problem in 6, the same results can be applied to the transformed mean linear contrast d µ, µ d = d, d not proportional to 0, and variance d Σd. It follows from Corollary that one obtains exact matching for contrasts or transformed variances from either the transformed right-haar prior π H of Table 4 or the Jeffreys prior since both have a =. Interestingly, one can use the Jeffreys prior in the original parameterization, since the Jeffreys prior is invariant to transformations. This result for contrasts and transformed variances was first found for the Jeffreys prior by Geisser & Cornfield 963. 3.. Preliminaries To prove frequentist matching for more complicated θ, the following technical lemmas will be repeatedly utilized. The first lemma is from 3d..8 in Rao 973. The proof of the second lemma is straightforward and is omitted. Lemma For n 3 and given σ, σ, ρ, the following three random variables are independent and have the indicated distributions: [ ] s [ r s T = σ ρσ ] Z ρ 3 standard normal, 8 s σ T 3 = s r σ ρ n, 9 T 5 = s σ n. 30 Lemma 3 Let Y α denote the α quantile of any random variable Y. a If g is a monotonically increasing function, [gy ] α = gy α for any α 0,. b For any a > 0, b IR, ay + b α = ay α + b. c If W is a positive random variable, W Y α 0 if and only if Y α 0. 7
In the remainder of the chapter we use, without further comment, the notation that appended to a random variable denotes randomness arising from the constructive posterior, while a random variable without a refers to randomness arising from the frequentist distribution of a statistic. And, again, Z i denote standard normal random variables and t m and m are, respectively, t and chi-squared random variables with m degrees of freedom. Whenever several of these occur in an expression, they are all independent except that random variables of the same type and with the same index refer to the same random variable. Finally, we reserve quantile notation for posterior quantiles, with respect to the distributions. Thus the quantile [ σ Z3 r Z 3 / n + ρ ] s would be computed α based on the joint distribution of Z3,, while holding σ, ρ, r, s, Z 3, n fixed. 3.. Credible intervals for µ µ under π ab Although Corollary showed which priors yield optimal confidence sets for contrasts, we will be studying the extent to which other priors fail to yield credible sets having good frequentist coverage. This is important for our later recommendation of a good overall objective prior for the bivariate normal problem. We will specifically study the performance of credible sets for µ µ under various priors, and the following theorem provides the needed result about coverage. Theorem a The posterior α quantile of µ µ is given by q α X = X X t + n a s nn a r s s t nn b b The frequentist coverage probability depends only on and is given by the expression = P τ = P µ µ < q α X µ, µ, σ, σ, ρ Z n < t n a n a τ τ Z n [ + t n a s r n a ] α. 3 σ ρσ σ + σ ρ σ σ / 3 t + τ t + n a n, 33 n b n a n α where the probability is computed according to the joint distribution of Z, Z, n, n. Proof. Part a follows from the constructive posteriors for the µ i given in 3 and 6. Noting that n[µ µ X X ]/σ + σ ρ σ σ / is standard normal, it follows that P µ µ < q α X µ, µ, σ, σ, ρ = P Z < σ + σ ρ σ σ t n a s r s n a s 8 [ t n b + t n a n a s r ] α,
where, henceforth, we omit the conditioning on the parameters. 8-30 together with algebra directly yields the conclusion. The sampling distributions in It is quite surprising that the coverage probability of these confidence intervals depends only on the single parameter τ, but also quite welcome in that coverage probabilities can then be graphed simply as a function of τ over the interval [0, ]. Because t n a and t n a have the same distribution, it is clear that the coverage is the same for τ and τ, so that it suffices to graph the coverage over the positive unit interval. Corollary Let F denote the cumulative distribution function of t and f n a be the density function of t n a. If τ, the formula 33 is equivalent to P µ µ < q α X µ, µ, σ, σ, ρ = P gz, Z, n, n < α, 34 where gz, Z, n, n = Z τ n τ Z t F n a n b + n a f n atdt. 35 t τ n Proof. It is enough to see that, for any random variables Z and W, if W α is the α quantile of W, P Z < W α = P P W < Z Z < α. This expression is more convenient for Monte Carlo computation, since one can just generate draws Z, Z, n, n, numerically compute gz, Z, n, n for each draw, and record the fraction of the time g is less than α. In contrast, one must determine posterior quantiles at each Monte Carlo step when using 33, a more difficult enterprise. 3..3 Credible intervals for a class of functions of σ, σ, ρ, including σ, σ, ρ, ρσ /σ, σ ρ, Σ, ρσ σ, σ /σ, and ρ/σ ρ We consider the one-sided credible intervals of σ, σ, and ρ and some functions of the form θ = σ d σd gρ, 36 for d, d IR and some function g. We also consider a class of scale-invariant priors of µ, µ, σ, σ, ρ, πµ, µ, σ, σ, ρ hρ σ c σ c, 37 for some c, c IR and a positive function h. The proof of the following theorem is given in Appendix B. Theorem 3 Denote the α posterior quantile of θ by θ α X under the prior 37. For any fixed µ, µ, σ, σ, ρ, the frequentist coverage of the credible interval θ L, θ α X depends only on ρ. Here θ L is the lower boundary of the parameter space for θ. 9
Remark Note that the priors π ab including π J, π IJ, π RO, and π H and π Rσ, π Rσ, π Rρ, π S in Table 4 are all of the form 37. From Theorem 3, if we use any of these priors, the frequentist coverage probabilities of the credible intervals for any of the parameters such as σ, σ, ρ, σ /σ, ρσ /σ, ρσ σ, Σ = σσ ρ will depend only on ρ. This will be useful in the numerical comparison. We next consider the posterior quantiles of σ is given in Appendix B. and their coverage probabilities. The proof Theorem 4 a For any α, the posterior α quantile of σ [ σ α = s r + Z 3 n a has the expression ] r. 38 r α b For any α 0,, ξ = µ, µ, σ, σ, and ρ,, the frequentist coverage probability of the credible interval 0, σ α is P σ < σ α ξ, ρ [ = P < n ρ + Z 3 Z 3 n a c As ρ, the left hand side of 39 goes to P which is α if and only if a =. Remark If ρ = 0, n P σ < σ α ξ, ρ = 0 = P n < ρ + n ρ ] n n α ρ. 39, 40 n a α < [ + Z 3 Z ] 3. n a n α If b = and ρ = 0, P σ < σ α ξ, ρ = 0 > P n < = α. α To study the frequentist coverage of the credible intervals of ρ, note that ψ, defined in 3, is invertible, and ψ ρ = ρ, ρ <. 4 ρ The following theorem, whose proof is given in Appendix B, is the key result about exact frequentist matching for ρ that was discussed in the introduction. 0
Theorem 5 Let ρ α = ρ α X be the α posterior quantile of ρ. a For ψ defined in 3, the posterior α quantile of ρ is b For any α 0,, ξ = µ, µ, σ, σ, and ρ,, P ρ < ρ α ξ, ρ = P Z 3 + n a n a Also, P ρ < ρ α ξ, ρ = P ρ Z 3 + ρ ρ α = ψy α. 4 n n > r > ψ ρ r α ρ Z 3 + ρ c For any α 0,, any ξ = µ, µ, σ, σ, and any ρ,, n a ρ. 43 α ρ.44 P ρ < ρ α ξ, ρ = α. 45 if and only if the right Haar prior is used, i.e., a, b =,. Corollary 3 a For any ξ = µ, µ, σ, σ, P 0 < ρ α ξ, ρ = 0 = P Z3 n < Z 3 = P t n < α n n b t α { = α, if b =, < α, if b <. 46 b As ρ, n P ρ < ρ α ξ, ρ P n F n,n < n a Fn a,. 47 n b α c As ρ, P ρ < ρ α ξ, ρ P n n F n,n > n a Fn a,. 48 n b α We next consider six functions of σ, σ, ρ, namely the regression coefficient, conditional variance, the generalized variance or the determinant of the covariance matrix Σ, the ratio of two variances, covariance, and η 3, i.e., θ = ρσ σ, θ = σ ρ, θ 3 = Σ = σ σ ρ, θ 4 = σ σ, σ = ρσ σ, η 3 = ρ. 49 σ ρ Note that all these functions are of the form 36. From Theorem 3, under any of the priors π J, π IJ, π Rσ, π Rρ, π RO, π H, π S, the frequentist coverage probabilities of credible intervals for any of the functions in 49 will depend only on ρ. We first give results for θ in the following theorem, whose proof is given in Appendix B.
Theorem 6 a The random posterior of θ = ρσ /σ has the expression θ = r s s Z 3 r s s. b For any α 0,, let θ α be the α posterior quantile of θ. For any ξ = µ, µ, σ, σ, and any ρ,, P θ < θ α ξ, ρ = P t n < n n b t α, 50 which does not depend on ρ. Furthermore, 50 equals α if and only if b =. Theorem 7 a The random posterior of θ = σ ρ = /η has the expression θ = s r. 5 b For any α 0,, let θ α be the α posterior quantile of θ. For any ξ = µ, µ, σ, σ, and any ρ,, P θ < θ α ξ, ρ = P n <, 5 α which does not depend on ρ. Furthermore, 5 equals α if and only if b =. Proof. Since θ = /η, part a follows from 8 directly. From 9, [ σ P θ < θ α ξ, ρ = P σ ρ < ρ ] n which implies part b. α ξ, ρ, The next theorem gives the results for the determinant of Σ = /η η = σ σ ρ. Theorem 8 a The random posterior of θ 3 = Σ is θ 3 = η η = S n a. b For any α 0,, the α posterior quantile of θ 3 is θ 3 α = S n a α. c For any ξ = µ, µ, σ, σ, and any ρ,, the frequentist coverage probability of the α credible interval of θ 3 is P Σ < S n a α ξ, ρ = P n n > n a α. 53 d The coverage probability in c does not depend on ρ, and equals α if a, b =, or a, b =,.
Proof. Parts a and b follow from 7 and 8 directly. It follows from 9 and 30 that S / Σ = n n. Part c holds. Part d is obvious. It is of interesting that both priors π IJ and π H are exact matching priors for Σ. The next theorem, whose proof is in Appendix B, gives the results for the ratio of two variances. Theorem 9 a The random posterior of θ 4 = σ /σ is θ4 = σ σ = s r s [ n a Z + 3 ] r. r b For any α 0,, let θ 4 α be the posterior α quantile of θ 4. For any ξ = µ, µ, σ, σ, and ρ,, the frequentist coverage probability of the credible interval 0, θ 4 α is P θ 4 < θ 4 α ξ, ρ = P G > ρ, 54 where G = ρ n n [ n a Z + 3 Z 3 ρ + n ρ ] n n. 55 α c As ρ, the left hand side of 54 goes to where P θ 4 < θ 4 α ξ, ρ P G α > 0, 56 G = Z 3 Z 3. 57 n Furthermore, the right hand side of 56 is α if and only if b =. The following theorems give the results for σ, the covariance of x and x, and the parameter η 3. The proofs can be found in Appendix B. Theorem 0 a The random posterior of σ = ρσ σ has the expression σ = Z 3 r s s r. r n a b For any α 0,, let σ α be the α posterior quantile of σ. For any ξ = µ, µ, σ, σ, and any ρ,, P σ < σ α ξ, ρ Z3 = P n ρ ρ n n < Z 3 ρ ρ n a n n α ρ. 58
Theorem a For any α 0,, let η 3 α be the α posterior quantile of η 3. For any ξ = µ, µ, σ, σ, and any ρ,, Z 3 + ρ P η 3 < η3 ρ α ξ, ρ = P n < n b 59 equals α for any < ρ < if and only if b =. Z3 + ρ ρ n α ρ. 59 3..4 Coverage Probabilities for µ /σ, µ /σ, and σ + σ ρσ σ under π ab There are many other interesting functions of µ, µ, σ, σ, ρ not of the form 36. For example, θ 5 = µ σ, θ 6 = µ σ, θ 7 = σ + σ ρσ σ. 60 In the next theorem, whose proof is given in Appendix B, consider θ 5 = µ /σ. Theorem a The random posterior of θ 5 = µ /σ has the expression n a θ5 = Z + x. n s b For any α 0,, denote the α posterior quantile of θ 5 by θ 5 α. The frequentist coverage of the credible interval, θ 5 α depends on θ 5 only, and equals P θ 5 < θ 5 α µ, µ, σ, σ, ρ = P Z θ 5 n n < c The coverage probability equals α if and only if a =. Z θ 5 n θ 5. 6 α n a The next theorem, whose proof is given in Appendix B, considers θ 6 = µ /σ. Theorem 3 θ 6 = µ σ a The random posterior of θ 6 = µ /σ can be expressed as = Z + n x s r [ + Z 3 n a r r ] /. 6 b For any α 0,, denote the α posterior quantile of θ 6 by θ 6 α. The frequentist coverage probability of the credible interval, θ 6 α depends on θ 6 and ρ and is given by P θ 6 < θ 6 α µ, µ, σ, σ, ρ = P θ 6 < θ 6 α θ 6, ρ, 63 4
where θ6 = Z + n θ 6 + Z n n a n n a ρ + G ρ + ρ n n 64 and G is given by 57. c as ρ, lim P θ 6 < θ6 Z θ 6 n α θ 6, ρ = P ρ n < Z θ 6 n θ 6, 65 n a α which equals α for any θ 6 if and only if a =. The next theorem, whose proof is given in Appendix B, considers θ 7 = σ + σ ρσ σ. Theorem 4 a The random posterior of θ 7 has the expression, θ 7 = s r + s n a + Z 3 n a s r r s s s. 66 b For any α 0,, denote the α posterior quantile of θ 7 by θ 7 α. The frequentist coverage of the α credible interval 0, θ 7 α is given by where τ is given by 3 and P θ 7 < θ 7 α µ, µ, σ, σ, ρ = P < G 4 α τ, G 4 = τ n + n n a τ τ Z 3 n Z 3 n n. 67 c As τ, lim P θ 7 < θ7 α θ = P θ which equals α if and only if a =. n <, 68 n a α 3. First Order Asymptotic Matching In this subsection, we derive the equations for finding the first order matching priors for each of the parameters µ, µ, σ, σ, ρ. Let I be the Fisher information of ζ = ζ,, ζ p = µ, µ, σ, σ, ρ and ω = ωζ be a function of ζ. Define the gradient vector by ω = ζ ω, ζ ω,, ζ p ω t. Define δ δ,, δ p t = I ω ω T I ω. 69 5
Following Datta & Ghosh 995, if the posterior probability of a one-sided credibility interval for ω and its frequentist coverage probability agree up to on /, the prior π should satisfy the equation p i= ζ i δ i π = 0. 70 Any positive solution π of such equation is called a first order matching prior for ω. 3.. The Fisher Information The Fisher information matrix of µ, µ, σ, σ, ρ is given by ρ σ σ σ 0 0 0 ρ σ σ 0 0 0 σ ρ I = ρ 0 0 ρ ρ σ σ σ σ ρ 0 0 ρ σ σ σ 0 0 ρ σ ρ σ ρ +ρ σ ρ, 7 The inverse of I is σ ρσ σ 0 0 0 ρσ σ σ V = I 0 0 0 = 0 0 σ ρ σ σ ρ ρ σ. 7 0 0 ρ σ σ σ ρ ρ σ 0 0 ρ ρ σ ρ ρ σ ρ The determinant of I is /{σ 4 σ 4 ρ 4 }, so the Jeffreys prior is π J σ σ ρ. 73 The independent Jeffreys prior, treating µ, µ and σ, σ, ρ as being independent, is then π IJ. 74 σ σ ρ 3/ 3.. Differential Equations of the First Order Matching Priors Table 6 gives the quantities δ,, δ 5 in the partial differential equations 69 for each of the indicated parameters. The table also gives the general form of the solutions to these partial differential equations, when they can be solved in closed form. Three illustrations follow. Illustration. a To see the differential equation of the first order matching prior π for µ, note first that µ =, 0, 0, 0, 0. Thus V µ = σ, ρσ σ, 0, 0, 0 and δ δ, δ, δ 3, δ 4, δ 5 = V µ µ V µ = σ, ρσ, 0, 0, 0, 6
Table 6: Quantities δ,, δ 5 in partial differential equation 69 for ω = µ, µ, σ, σ, ρ, d µ = d µ + d µ, θ = ρσ σ, θ = σ ρ, θ 3 = Σ = σ σ ρ, θ 4 = σ, σ σ = ρσ σ, η 3 = ρ, θ σ ρ 5 = µ σ, θ 6 = µ σ, θ 7 = σ + σ ρσ σ and their possible solutions. Here means a general solution is quite complicated, g,,, is some positive and differentiable function, h = d σ + d ρσ σ / d σ + d d ρσ σ + d σ, h = d ρσ σ + d σ/ d σ + d d ρσ σ + d σ, and h 35 = ρ [ρσ +σ σ σ ]/[ θ 7 ]. ω δ δ δ 3 δ 4 δ 5 solution µ σ ρσ 0 0 0 σ gµ ρµ σ, σ, σ, ρ µ ρσ σ 0 0 0 σ gµ ρµ σ, σ, σ, ρ σ 0 0 σ σ ρ ρ gµ ρ, µ, σ ρ, ρ σ σ 0 0 σ ρ ρ ρ 0 0 ρσ ρ σ ρ ρ σ gµ ρ, µ, σ ρ, ρσ ρ * d µ h h 0 0 0 g µ µ σ d ρσ +d σ σ d σ +d ρσ, σ, σ, ρ θ 0 0 0 ρ ρ σ ρ 3/ σ gµ, µ, σ, ρ σ θ 0 0 0 θ 3 0 0 θ 4 0 0 σ 0 0 σ ρ σ ρσ η 3 0 0 0 θ 5 θ 6 σ + µ σ ρσ + µ σ ρσ + µ σ σ + µ σ θ 7 0 0 ρ σ ρ ρ * ρ σ σ 0 gµ σ, µ, σ σ, ρ 0 gµ, µ, σ σ, ρ +ρ gµ, µ, ρ +ρ σ, σ ρσ ρ +ρ +ρ ρ ρ σ ρ ρ ρ µ + µ σ µ σ ρ σ + µ σ µ σ ρ σ + µ σ µ + µ σ σ σ ρσ θ7 σ ρσ σ µ ρ ρ σ + µ σ µ ρ ρ σ + µ σ σ ρ gµ, µ, σ, ρ σ θ7 h 35 * * * 7
so equation 69 becomes σ π + ρσ π = 0. 75 µ µ b Any prior for µ, µ, σ, σ, ρ which does not depend on µ, µ would be a matching prior for µ. A general solution of 75 is of the form gµ ρµ σ σ, σ, σ, ρ for some positive and differentiable function g,,,. Illustration. a To see the differential equation of the first order matching prior π for σ, note first that σ = 0, 0,, 0, 0. Thus V σ = 0, 0, σ, ρ σ σ, σ ρ ρ and δ δ, δ, δ 3, δ 4, δ 5 = V σ σ V σ = 0, 0, σ, ρ σ, ρ ρ, so equation 69 becomes σ π + ρ σ π + ρ ρ π = 0. 76 σ σ ρ b Any prior for µ, µ, σ, σ, ρ which does not depend on σ, σ, ρ would be a matching prior for σ. A general solution of 76 is of the form gµ ρ, µ, σ ρ ρ, for some positive and differentiable function g,,,. σ ρ Illustration 3. a To see the differential equation of the first order matching prior π for ρ, note first that σ = 0, 0, 0, 0,. Thus V ρ = 0, 0, σ ρ ρ, σ ρ ρ, ρ and δ δ, δ, δ 3, δ 4, δ 5 = V ρ ρ V ρ = 0, 0, ρσ, ρσ, ρ, 77 so equation 69 becomes σ ρσ π + σ ρσ π + ρ π = 0. 78 ρ The solution to this differential equation is rather complicated and is omitted. For each of the ten objective priors π J, π IJ, π Rρ, π Rσ, π RO, π Rλ, π H, π S, π MS, and π Rσ, we also examine if it is a first order matching prior for each of the parameters such as µ, µ, σ, σ, ρ, θ = ρσ σ, θ = σ ρ, θ 3 = Σ = σ σ ρ, σ = ρσ σ, θ 4 = σ, σ η 3 = ρ, θ σ ρ 5 = µ σ, θ 6 = µ σ, and θ 7 = σ + σ ρσ σ. The results are listed in Table 7. For example, π J is a first order matching prior for µ, µ, σ, σ, σ σ, θ 4, θ 5, θ 6, and θ 7, but not for θ, θ 3, σ, and η 3. 8