Ελληνικό Στατιστικό Ινστιτούτο Πρακτικά 0 ου Πανελληνίου Συνεδρίου Στατιστικής (007), σελ 461-468 CUMULANT PLOTS FOR THE THREE- PARAMETER GAMMA DISTRIBUTION Ioannis A. Koutrouvelis Department of Engineering Sciences, University of Patras koutrouv@upatras.gr ABSTRACT The families of two and three-parameter gamma distributions are widely used in reliability and life testing, material strength, meteorology and other scientific fields where there is a need for describing data with positive skewness. Cumulant plots are investigated in this paper for exploring the applicability of the three-parameter gamma distribution. One of the plots is furnished with a band of 95% simultaneous confidence level that is based on both asymptotic and finite-sample results. Simulated samples from gamma, lognormal and inverse Gaussian distributions are used in the investigation of the properties of this plot. In additio the plots are applied in real data and compared with the probability plot. 1. INTRODUCTION Cumulant plots for exploring the appropriateness of probability models were introduced by Koutrouvelis (1994) in connection with the normal distribution. Koutrouvelis (006) investigated plots for both location-scale models and distributions containing a shape parameter and furnished the plots with confidence bands that are based on the asymptotic behaviour of the empirical moment generating function. These plots are exploratory procedures and cannot play the role of goodness-of-fit tests since the confidence level applies for each individual point of the plot and not for all points simultaneously. In this paper we investigate graphical testing procedures for the three-parameter gamma (G3) distribution based on the cumulant generating function. Chambers et al. (1983) attempted using quantile plots as exploratory tests for the two-parameter gamma distribution but encountered serious difficulties. The difficulties are getting worse for exploring the appropriateness of the G3 model (a Pearson type III distribution) with the help of quantile plots. One of the plots proposed in this paper is furnished with a simultaneous 95% confidence band that is based on both asymptotic results of the empirical moment generating function and on finite-sample results of the supremum of a standardized version of residuals after fitting the gamma distribution by a mixed-moments procedure (Koutrouvelis and Canavos, 1999). This graphical procedure is equivalent to a 5% goodness-of-fit test. - 461 -
Section introduces the notation and lists the appropriate steps for constructing the exploratory cumulant plot for the G3 distribution. The proposed plot with a simultaneous 95% confidence band is found in Section 3 and is applied to a set of real data. Section 4 contains the results of simulation experiments for the size and the power of the resulting goodness of fit test for the G3 model with lognormal and inverse Gaussian alternatives.. EXPLORATORY CUMULANT PLOT FOR THE G3 DISTRIBUTION The three-parameter gamma distributio G3( α, β, γ), has shape parameter α( α > 0), scale parameter β ( β > 0) and location parameter γ ( < γ < ). Its moment generating and cumulant generating functions are given by α 1 ψ( t) = exp( γt)(1 βt) and K( t) = ln ψ( t) = γt α ln(1 βt) for t <, β respectively. We consider the standardized version of the Cumulant Generating Function (SCGF) K( t / β) ln(1 t) η( t) = = γ αβ, t < 1 t / β t and its sample counterpart, the empirical SCGF (ESCGF), defined by kn ( t / ˆ β j ) lnψ n ( t / ˆ β j ) η nj ( t) = =. (1) t / ˆ β j t / ˆ β j In (1) ψ n (t) denotes the empirical moment generating function (EMGF) 1 n ψ n ( t) = exp( tx i ) n i= 1 and βˆ j the estimate of β corresponding to the jth iteration ( j 0) of the mixedmoments (MXM) method of estimation which starts with the conditional moments (CM) estimates of α and β and uses equispaced points, t k = kτ ( k = 1,,,r), where the spacing τ depends on the CM estimate of α (Koutrouvelis & Canavos, 1999). The ESCGF possesses the following important properties for each j 0 : Is equivariant to location-scale transformations of the data. η nj ( t) x (as t 0). Under the G 3( α, β, γ) distribution and for a t b < 0, where a, b are negative numbers with a < b, it follows that [ η ˆ nj ( t) γ + αβ j ln( 1 t) t] n Z( t) (as n ), where Z (t) is a zero mean Gaussian process with covariance kernel α β (1 s)(1 t) C ( s, t) = 1. st 1 ( ) s + t - 46 -
Let β ˆ be the final MXM estimate of β. According to the last property, the vector ε n = ( ε n ( t1 ), ε n ( t ),, ε n ( tr )) of errors in the regression model ˆ ln(1 ) ηn ( ) = γ αβ + εn( ), k = 1,,, r, () will have an asymptotic normal distribution with mean 0 and variance-covariance matrix σ kl = ( 1 n) C(, tl ). Koutrouvelis (006) utilized the above properties and proposed an exploratory graphical procedure for testing G3(α,β,γ) which contains the following steps: Apply ordinary least squares (OLS) in model () to obtain the estimates ˆα, ˆγ. Plot the values η n ( t k ) against λ k = βˆ ln(1 ) for k = 1,,, r and draw the line ηˆ ( t) = γˆ αβ ˆ ˆ ln(1 t) t. Draw a line through the points L( ) = ηˆ( ) 1.96 sˆ k and a line through the points U ( ) = ηˆ( ) + 1.96 sˆ k, where ŝ k is the estimated asymptotic standard deviation of the kth residual e n ( t k ) after fitting the regression model. Letting X be the design matrix, H the HAT matrix, H = X(X X) 1 X, and Σ the asymptotic varcov matrix of the error terms in model (), we find ˆ ˆ ( ˆ 1 s ˆ = [ th k k diag of ( I H) Σ I H)], where ˆΗ and ˆΣ are the estimates of H and Σ after replacing the parameters α, β by their MXM estimates. Judge the appropriateness of G3 according to whether L( t) ηn ( ) U ( t) for every k = 1,,, r. Note that the band ( L( t), U ( t)) is not a simultaneous 95% confidence band but it contains individual 95% confidence intervals. 3. CUMULANT PLOT FOR THE G3 MODEL WITH A SIMULTANEOUS 95% CONFIDENCE BAND In order to furnish the cumulant plot with simultaneous confidence bands, we need to use the distribution of sup { en ( ) sˆ k ;1 k r}. Under the assumption of a G3 populatio the distribution depends on the sample size the true value of the shape parameter α and the number and location of the points t k. The distributions for r = 0 equispaced points t k = kτ were investigated through simulation experiments with 50 000 samples of sizes n = 0, 5, 30, 40, 50, 100, 00, 300, 500, 1000 drawn from the G3 model with β = 1, γ = 0 and various values of α in the interval [0.05, 10.0], and with 5 values of the spacing τ between the points t k (τ = 4,, 1, 0.50, 0.5). - 463 -
For each τ, stepwise regression was used to fit empirical models for the 0.95 and 0.99 quantiles of the distribution of sup{ en ( ) sˆ k ;1 k 0} to the simulation results with the various combinations of the user parameters n and α. We report here only the results for the 0.95 quantile. The fitted models found for q0. 95 ( τ, a) and the corresponding coefficient of determination ( R q 0. 95 0. 604 ( 4, α ) =. 454 0. 0055 nα ( R ) = 99. 0%), found are n n + 0. 00071 + 0. 180α α 1. 186 6. 11. 54 0. 064 q0. 95 (, α) =. 169 + + ( R = 99. 39%), nα n n α 5. 55 1. 081 1. 45 0. 073 q0. 95 ( 1, α) =. 161 + + ( R = 99. 0%), n nα n α 6. 10 q0. 95 ( 0. 5, α ) =. 306 0. 1407α + 0. 176α ( R = 98. 87%), n q 5. 99 0. 94 ( 0. 5, α) =. 171 + + + 0. 305α + 0. 084 α n 4. 88 0. 95 n nα 0. 033 n α 1. 79 0. 015α 0. 00013n ( R = 99. 18%). α It should be noted that the fitted quantile regression models eliminate the need to use tables in order to find critical test values. The simultaneous 95% band for testing the appropriateness of G 3( α, β, γ ) is found by first choosing a value of τ ( α 0 ) among the 5 values of the spacing considered according to the initial estimate α 0 of the shape parameter. Specifically, the procedure uses τ = 4 if ˆ α 0 0. 0, τ = if 0. 0 < ˆ α 0 0. 35, τ = 1 if 0. 35 < ˆ α 0 0. 75, τ = 0. 5 if 0. 75 < ˆ α 0 1. 5 and τ = 0. 5 if ˆ α 0 > 1. 5. Then the band is given by ( LS ( t), U S ( t)) = ˆ η( t k ) ± qˆ 0. 95 ( τ ( α 0 ), ˆ α ) sk, where ηˆ ( t), sk were given in Section, and αˆ is the final MXM estimate of α. If η n ( t k ) falls outside the bands ( LS ( t), U S ( t)) for at least one k ( k = 1,,,0), the hypothesis of the G3 model can be rejected at the 5% level of significance. This plotting procedure is equivalent to a 5% goodness-of-fit test for the G3( α, β, γ) distribution. Guidance for choosing the value τ ( α 0 ) was given by the - 464 -
performance of the test in a series of initial simulation experiments, with the 5% nominal level. As an example, a data set containing 46 active repair times (in h) for an airborne communication transceiver (Von Alve 1964) was analysed. The G3 cumulant plot obtained by the previous procedure is given in Figure 1. It can be seen that the data points (dots) fall inside the 95% simultaneous confidence band. Therefore, the threeparameter gamma distribution can not be rejected at the 5% level of significance. 3.0.5 Figure 1. G3 Cumulant Plot - Active Repair Times (n=46) Variable HTAN HTAHUT HTAL HTAU Y-Data.0 1.5 1.0 1 3 X 4 5 A clearer picture is obtained by plotting the curves of the values of half the bandwidth and of the absolute residuals after applying OLS in model () against t. This plot is shown in Figure. Figure. ABS(RES) & PREC OF 95% SIMULT BANDS - ACTIVE REPAIR TIMES (n=46) 0.1 0.10 Variable BNDWIDTH/ ABS RES 0.08 Y-Data 0.06 0.04 0.0 0.00-0 -15-10 T -5 0 The probability plot of these data with bands containing individual 95% confidence intervals (obtained from Minitab, Release 14) is shown in Figure 3. In - 465 -
contrast to the plots of Figures 1 and this plot indicates that the G3 model may not be appropriate. However, the plots are not comparable since the error rate of 5% in the plot of Figure 3 corresponds to each individual plotting position. Simultaneous bands in probability plots for the G3 distribution are not available but it is obvious that they should contain the individual 95% confidence intervals. Therefore, if the simultaneous 95% confidence bands for probability plots were available and drawn in Figure 3, all points could lie within the bands as in Figure 1. Figure 3. Probability Plot of Active Repair Times (n=46) 3-Parameter Gamma - 95% CI Percent 99.9 99 95 90 80 70 60 50 40 30 0 10 Shape 0.7086 Scale 4.810 Thresh 0.198 N 46 AD 0.68 P-Value * 5 3 1 0.001 0.010 0.100 1.000 X-DATA - Threshold 10.000 100.000 4. SIMULATION RESULTS FOR THE GOODNESS-OF-FIT TEST To judge the performance of the goodness-of-fit-test implied by the plotting procedure of Section 3 we run simulation experiments with 1000 samples drawn from the G 3 ( α, β = 1, γ = 0), and three-parameter lognormal (LN3) and inverse Gaussian (IG3) distributions with scale parameter 1 and location parameter 0. Table 1 shows the results for various sample sizes n and skewness coefficient a 3 of the parent population used in the simulations. It is seen that the actual level of the test is close to the nominal level for G3 distributions having a3 > 1 (equivalently a < 4 ) and sample sizes n 5. The test can not differentiate between the G3 and the other skewed distributions when the sample is small and the actual distribution has a small degree of skewness. This is expectable because the distributions considered differ most in their tails with differences getting larger for larger degrees of skewnness. As sample size or skewness of the parent population gets larger, the power against the lognormal and inverse Gaussian alternatives increases towards 1 indicating that the proposed test is consistent against these alternatives. The proposed test can be easily adapted to handle testing for the two parameter gamma distributio i.e. the G3 with location parameter known to be 0. - 466 -
Table 1. Simulation results of 5% test for the G3( α, β, γ) α 3 n = 5 n = 50 100 500 1000 1000 samples from G 3 ( α, β = 1, γ = 0) 1.00 0.03 0.01 0.03 0.034 0.047 1.41 0.038 0.049 0.06 0.046 0.040.00 0.074 0.051 0.053 0.057 0.043.83 0.069 0.058 0.051 0.044 0.05 4.00 0.06 0.058 0.054 0.056 0.055 6.3 0.054 0.056 0.048 0.048 0.057 1000 samples from IG 3 ( α, β = 1, γ = 0) 1.00 0.09 0.03 0.07 0.07 0.030 1.41 0.07 0.05 0.075 0.97 0.538.00 0.057 0.10 0.14 0.76 0.936.83 0.13 0.5 0.405 0.963 0.999 4.00 0.9 0.37 0.65 1.000 1.000 6.3 0.301 0.508 0.779 1.000 1.000 1000 samples from LN 3 ( ζ = 0, τ, γ = 0) 1.00 0.00 0.04 0.03 0.05 0.05 1.41 0.09 0.053 0.074 0.190 0.371.00 0.06 0.106 0.184 0.705 0.933.83 0.099 0.187 0.339 0.909 0.998 4.00 0.151 0.60 0.486 0.984 1.000 6.3 0.06 0.383 0.610 0.999 1.000 ΠΕΡΙΛΗΨΗ Οι οικογένειες της διπαραμετρικής και της τριπαραμετρικής κατανομής γάμα χρησιμοποιούνται ευρέως στην ανάλυση αξιοπιστίας και μακροβιότητας, τη στατιστική υδρολογία, την αντοχή υλικών, τη μετεωρολογία και άλλους επιστημονικούς κλάδους όπου υπάρχει ανάγκη περιγραφής δεδομένων με θετική λοξότητα. Στην εργασία αυτή εξετάζονται διαγράμματα ημιαναλλοίωτων για διερεύνηση της καταλληλότητας της τριπαραμετρικής γάμα κατανομής. Ένα από τα διαγράμματα εφοδιάζεται με ζώνη εμπιστοσύνης ταυτόχρονου επιπέδου 95%, που βασίζεται στην ασυμπτωτική συμπεριφορά της εμπειρικής ροπογεννήτριας - 467 -
συνάρτησης και σε αποτελέσματα από πεπερασμένα δείγματα. Προσομοιωμένα δείγματα από τις κατανομές γάμα, λογαριθμοκανονική και αντίστροφη κανονική χρησιμοποιούνται για τη εύρεση των ιδιοτήτων αυτού του διαγράμματος. Επίσης, γίνεται εφαρμογή σε πραγματικά δεδομένα και τα προτεινόμενα διαγράμματα συγκρίνονται με πιθανοτικά διαγράμματα. REFERENCES Chambers J.M., Cleveland W.S., Kleiner B. and Tukey P.A. (1983). Graphical Procedures for Data Analysis, Duxbury Press, Belmont. Κουτρουβέλης Ι.Α.(1994) Γραφικές μέθοδοι ανάλυσης μονοδιάστατων δεδομένων. Πρακτικά 7 ου Πανελλήνιου Συνεδρίου Στατιστικής (Λευκωσία), σελ. 150-157. Κουτρουβέλης Ι.Α.(006) Γραφικά για επιλογή κατάλληλου Προτύπου Κατανομής. Πρακτικά 19 ου Πανελλήνιου Συνεδρίου Στατιστικής (Καστοριά), υπό έκδοση. Koutrouvelis I.A. and Canavos G.C. (1999). Estimation in the Pearson type 3 distribution. Water Resources Research, 35, pp. 693-704. MINITAB, Release 14 (003). Minitab, Inc. Von Alven W. H. (1964). Reliability Engineering byarinc. Prentice-Hall, Inc., Englewood Cliffs. - 468 -