CUMULANT PLOTS FOR THE THREE- PARAMETER GAMMA DISTRIBUTION

Σχετικά έγγραφα
ST5224: Advanced Statistical Theory II

Other Test Constructions: Likelihood Ratio & Bayes Tests

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Solution Series 9. i=1 x i and i=1 x i.

2 Composition. Invertible Mappings

Μηχανική Μάθηση Hypothesis Testing

6.3 Forecasting ARMA processes

Section 8.3 Trigonometric Equations

EE512: Error Control Coding

Homework 3 Solutions

Statistical Inference I Locally most powerful tests

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

5.4 The Poisson Distribution.

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

4.6 Autoregressive Moving Average Model ARMA(1,1)

An Inventory of Continuous Distributions

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Lecture 34 Bootstrap confidence intervals


Queensland University of Technology Transport Data Analysis and Modeling Methodologies

FORMULAS FOR STATISTICS 1

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Example Sheet 3 Solutions

Areas and Lengths in Polar Coordinates

Instruction Execution Times

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Concrete Mathematics Exercises from 30 September 2016

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Biostatistics for Health Sciences Review Sheet

Numerical Analysis FMN011

Areas and Lengths in Polar Coordinates

Math 6 SL Probability Distributions Practice Test Mark Scheme

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Approximation of distance between locations on earth given by latitude and longitude

w o = R 1 p. (1) R = p =. = 1

6. MAXIMUM LIKELIHOOD ESTIMATION

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Second Order RLC Filters

Matrices and Determinants

Statistics & Research methods. Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science

Probability and Random Processes (Part II)

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679


Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Durbin-Levinson recursive method

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Supplementary Appendix

C.S. 430 Assignment 6, Sample Solutions

Strain gauge and rosettes

χ 2 test ανεξαρτησίας

Introduction to the ML Estimation of ARMA processes

ΠΑΡΑΜΕΤΡΟΙ ΕΠΗΡΕΑΣΜΟΥ ΤΗΣ ΑΝΑΓΝΩΣΗΣ- ΑΠΟΚΩΔΙΚΟΠΟΙΗΣΗΣ ΤΗΣ BRAILLE ΑΠΟ ΑΤΟΜΑ ΜΕ ΤΥΦΛΩΣΗ

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

PARTIAL NOTES for 6.1 Trigonometric Identities

Reminders: linear functions

Inverse trigonometric functions & General Solution of Trigonometric Equations

Description of the PX-HC algorithm

Exercise 2: The form of the generalized likelihood ratio

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

The Simply Typed Lambda Calculus

The challenges of non-stable predicates

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Section 9.2 Polar Equations and Graphs

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Problem Set 3: Solutions

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

Repeated measures Επαναληπτικές μετρήσεις

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Partial Trace and Partial Transpose

Every set of first-order formulas is equivalent to an independent set

D Alembert s Solution to the Wave Equation

Partial Differential Equations in Biology The boundary element method. March 26, 2013

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Math221: HW# 1 solutions

Finite Field Problems: Solutions

Tridiagonal matrices. Gérard MEURANT. October, 2008

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Correction Table for an Alcoholometer Calibrated at 20 o C

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Congruence Classes of Invertible Matrices of Order 3 over F 2

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

derivation of the Laplacian from rectangular to spherical coordinates

Section 7.6 Double and Half Angle Formulas

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Homework 8 Model Solution Section

Exercises to Statistics of Material Fatigue No. 5

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

the total number of electrons passing through the lamp.

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Transcript:

Ελληνικό Στατιστικό Ινστιτούτο Πρακτικά 0 ου Πανελληνίου Συνεδρίου Στατιστικής (007), σελ 461-468 CUMULANT PLOTS FOR THE THREE- PARAMETER GAMMA DISTRIBUTION Ioannis A. Koutrouvelis Department of Engineering Sciences, University of Patras koutrouv@upatras.gr ABSTRACT The families of two and three-parameter gamma distributions are widely used in reliability and life testing, material strength, meteorology and other scientific fields where there is a need for describing data with positive skewness. Cumulant plots are investigated in this paper for exploring the applicability of the three-parameter gamma distribution. One of the plots is furnished with a band of 95% simultaneous confidence level that is based on both asymptotic and finite-sample results. Simulated samples from gamma, lognormal and inverse Gaussian distributions are used in the investigation of the properties of this plot. In additio the plots are applied in real data and compared with the probability plot. 1. INTRODUCTION Cumulant plots for exploring the appropriateness of probability models were introduced by Koutrouvelis (1994) in connection with the normal distribution. Koutrouvelis (006) investigated plots for both location-scale models and distributions containing a shape parameter and furnished the plots with confidence bands that are based on the asymptotic behaviour of the empirical moment generating function. These plots are exploratory procedures and cannot play the role of goodness-of-fit tests since the confidence level applies for each individual point of the plot and not for all points simultaneously. In this paper we investigate graphical testing procedures for the three-parameter gamma (G3) distribution based on the cumulant generating function. Chambers et al. (1983) attempted using quantile plots as exploratory tests for the two-parameter gamma distribution but encountered serious difficulties. The difficulties are getting worse for exploring the appropriateness of the G3 model (a Pearson type III distribution) with the help of quantile plots. One of the plots proposed in this paper is furnished with a simultaneous 95% confidence band that is based on both asymptotic results of the empirical moment generating function and on finite-sample results of the supremum of a standardized version of residuals after fitting the gamma distribution by a mixed-moments procedure (Koutrouvelis and Canavos, 1999). This graphical procedure is equivalent to a 5% goodness-of-fit test. - 461 -

Section introduces the notation and lists the appropriate steps for constructing the exploratory cumulant plot for the G3 distribution. The proposed plot with a simultaneous 95% confidence band is found in Section 3 and is applied to a set of real data. Section 4 contains the results of simulation experiments for the size and the power of the resulting goodness of fit test for the G3 model with lognormal and inverse Gaussian alternatives.. EXPLORATORY CUMULANT PLOT FOR THE G3 DISTRIBUTION The three-parameter gamma distributio G3( α, β, γ), has shape parameter α( α > 0), scale parameter β ( β > 0) and location parameter γ ( < γ < ). Its moment generating and cumulant generating functions are given by α 1 ψ( t) = exp( γt)(1 βt) and K( t) = ln ψ( t) = γt α ln(1 βt) for t <, β respectively. We consider the standardized version of the Cumulant Generating Function (SCGF) K( t / β) ln(1 t) η( t) = = γ αβ, t < 1 t / β t and its sample counterpart, the empirical SCGF (ESCGF), defined by kn ( t / ˆ β j ) lnψ n ( t / ˆ β j ) η nj ( t) = =. (1) t / ˆ β j t / ˆ β j In (1) ψ n (t) denotes the empirical moment generating function (EMGF) 1 n ψ n ( t) = exp( tx i ) n i= 1 and βˆ j the estimate of β corresponding to the jth iteration ( j 0) of the mixedmoments (MXM) method of estimation which starts with the conditional moments (CM) estimates of α and β and uses equispaced points, t k = kτ ( k = 1,,,r), where the spacing τ depends on the CM estimate of α (Koutrouvelis & Canavos, 1999). The ESCGF possesses the following important properties for each j 0 : Is equivariant to location-scale transformations of the data. η nj ( t) x (as t 0). Under the G 3( α, β, γ) distribution and for a t b < 0, where a, b are negative numbers with a < b, it follows that [ η ˆ nj ( t) γ + αβ j ln( 1 t) t] n Z( t) (as n ), where Z (t) is a zero mean Gaussian process with covariance kernel α β (1 s)(1 t) C ( s, t) = 1. st 1 ( ) s + t - 46 -

Let β ˆ be the final MXM estimate of β. According to the last property, the vector ε n = ( ε n ( t1 ), ε n ( t ),, ε n ( tr )) of errors in the regression model ˆ ln(1 ) ηn ( ) = γ αβ + εn( ), k = 1,,, r, () will have an asymptotic normal distribution with mean 0 and variance-covariance matrix σ kl = ( 1 n) C(, tl ). Koutrouvelis (006) utilized the above properties and proposed an exploratory graphical procedure for testing G3(α,β,γ) which contains the following steps: Apply ordinary least squares (OLS) in model () to obtain the estimates ˆα, ˆγ. Plot the values η n ( t k ) against λ k = βˆ ln(1 ) for k = 1,,, r and draw the line ηˆ ( t) = γˆ αβ ˆ ˆ ln(1 t) t. Draw a line through the points L( ) = ηˆ( ) 1.96 sˆ k and a line through the points U ( ) = ηˆ( ) + 1.96 sˆ k, where ŝ k is the estimated asymptotic standard deviation of the kth residual e n ( t k ) after fitting the regression model. Letting X be the design matrix, H the HAT matrix, H = X(X X) 1 X, and Σ the asymptotic varcov matrix of the error terms in model (), we find ˆ ˆ ( ˆ 1 s ˆ = [ th k k diag of ( I H) Σ I H)], where ˆΗ and ˆΣ are the estimates of H and Σ after replacing the parameters α, β by their MXM estimates. Judge the appropriateness of G3 according to whether L( t) ηn ( ) U ( t) for every k = 1,,, r. Note that the band ( L( t), U ( t)) is not a simultaneous 95% confidence band but it contains individual 95% confidence intervals. 3. CUMULANT PLOT FOR THE G3 MODEL WITH A SIMULTANEOUS 95% CONFIDENCE BAND In order to furnish the cumulant plot with simultaneous confidence bands, we need to use the distribution of sup { en ( ) sˆ k ;1 k r}. Under the assumption of a G3 populatio the distribution depends on the sample size the true value of the shape parameter α and the number and location of the points t k. The distributions for r = 0 equispaced points t k = kτ were investigated through simulation experiments with 50 000 samples of sizes n = 0, 5, 30, 40, 50, 100, 00, 300, 500, 1000 drawn from the G3 model with β = 1, γ = 0 and various values of α in the interval [0.05, 10.0], and with 5 values of the spacing τ between the points t k (τ = 4,, 1, 0.50, 0.5). - 463 -

For each τ, stepwise regression was used to fit empirical models for the 0.95 and 0.99 quantiles of the distribution of sup{ en ( ) sˆ k ;1 k 0} to the simulation results with the various combinations of the user parameters n and α. We report here only the results for the 0.95 quantile. The fitted models found for q0. 95 ( τ, a) and the corresponding coefficient of determination ( R q 0. 95 0. 604 ( 4, α ) =. 454 0. 0055 nα ( R ) = 99. 0%), found are n n + 0. 00071 + 0. 180α α 1. 186 6. 11. 54 0. 064 q0. 95 (, α) =. 169 + + ( R = 99. 39%), nα n n α 5. 55 1. 081 1. 45 0. 073 q0. 95 ( 1, α) =. 161 + + ( R = 99. 0%), n nα n α 6. 10 q0. 95 ( 0. 5, α ) =. 306 0. 1407α + 0. 176α ( R = 98. 87%), n q 5. 99 0. 94 ( 0. 5, α) =. 171 + + + 0. 305α + 0. 084 α n 4. 88 0. 95 n nα 0. 033 n α 1. 79 0. 015α 0. 00013n ( R = 99. 18%). α It should be noted that the fitted quantile regression models eliminate the need to use tables in order to find critical test values. The simultaneous 95% band for testing the appropriateness of G 3( α, β, γ ) is found by first choosing a value of τ ( α 0 ) among the 5 values of the spacing considered according to the initial estimate α 0 of the shape parameter. Specifically, the procedure uses τ = 4 if ˆ α 0 0. 0, τ = if 0. 0 < ˆ α 0 0. 35, τ = 1 if 0. 35 < ˆ α 0 0. 75, τ = 0. 5 if 0. 75 < ˆ α 0 1. 5 and τ = 0. 5 if ˆ α 0 > 1. 5. Then the band is given by ( LS ( t), U S ( t)) = ˆ η( t k ) ± qˆ 0. 95 ( τ ( α 0 ), ˆ α ) sk, where ηˆ ( t), sk were given in Section, and αˆ is the final MXM estimate of α. If η n ( t k ) falls outside the bands ( LS ( t), U S ( t)) for at least one k ( k = 1,,,0), the hypothesis of the G3 model can be rejected at the 5% level of significance. This plotting procedure is equivalent to a 5% goodness-of-fit test for the G3( α, β, γ) distribution. Guidance for choosing the value τ ( α 0 ) was given by the - 464 -

performance of the test in a series of initial simulation experiments, with the 5% nominal level. As an example, a data set containing 46 active repair times (in h) for an airborne communication transceiver (Von Alve 1964) was analysed. The G3 cumulant plot obtained by the previous procedure is given in Figure 1. It can be seen that the data points (dots) fall inside the 95% simultaneous confidence band. Therefore, the threeparameter gamma distribution can not be rejected at the 5% level of significance. 3.0.5 Figure 1. G3 Cumulant Plot - Active Repair Times (n=46) Variable HTAN HTAHUT HTAL HTAU Y-Data.0 1.5 1.0 1 3 X 4 5 A clearer picture is obtained by plotting the curves of the values of half the bandwidth and of the absolute residuals after applying OLS in model () against t. This plot is shown in Figure. Figure. ABS(RES) & PREC OF 95% SIMULT BANDS - ACTIVE REPAIR TIMES (n=46) 0.1 0.10 Variable BNDWIDTH/ ABS RES 0.08 Y-Data 0.06 0.04 0.0 0.00-0 -15-10 T -5 0 The probability plot of these data with bands containing individual 95% confidence intervals (obtained from Minitab, Release 14) is shown in Figure 3. In - 465 -

contrast to the plots of Figures 1 and this plot indicates that the G3 model may not be appropriate. However, the plots are not comparable since the error rate of 5% in the plot of Figure 3 corresponds to each individual plotting position. Simultaneous bands in probability plots for the G3 distribution are not available but it is obvious that they should contain the individual 95% confidence intervals. Therefore, if the simultaneous 95% confidence bands for probability plots were available and drawn in Figure 3, all points could lie within the bands as in Figure 1. Figure 3. Probability Plot of Active Repair Times (n=46) 3-Parameter Gamma - 95% CI Percent 99.9 99 95 90 80 70 60 50 40 30 0 10 Shape 0.7086 Scale 4.810 Thresh 0.198 N 46 AD 0.68 P-Value * 5 3 1 0.001 0.010 0.100 1.000 X-DATA - Threshold 10.000 100.000 4. SIMULATION RESULTS FOR THE GOODNESS-OF-FIT TEST To judge the performance of the goodness-of-fit-test implied by the plotting procedure of Section 3 we run simulation experiments with 1000 samples drawn from the G 3 ( α, β = 1, γ = 0), and three-parameter lognormal (LN3) and inverse Gaussian (IG3) distributions with scale parameter 1 and location parameter 0. Table 1 shows the results for various sample sizes n and skewness coefficient a 3 of the parent population used in the simulations. It is seen that the actual level of the test is close to the nominal level for G3 distributions having a3 > 1 (equivalently a < 4 ) and sample sizes n 5. The test can not differentiate between the G3 and the other skewed distributions when the sample is small and the actual distribution has a small degree of skewness. This is expectable because the distributions considered differ most in their tails with differences getting larger for larger degrees of skewnness. As sample size or skewness of the parent population gets larger, the power against the lognormal and inverse Gaussian alternatives increases towards 1 indicating that the proposed test is consistent against these alternatives. The proposed test can be easily adapted to handle testing for the two parameter gamma distributio i.e. the G3 with location parameter known to be 0. - 466 -

Table 1. Simulation results of 5% test for the G3( α, β, γ) α 3 n = 5 n = 50 100 500 1000 1000 samples from G 3 ( α, β = 1, γ = 0) 1.00 0.03 0.01 0.03 0.034 0.047 1.41 0.038 0.049 0.06 0.046 0.040.00 0.074 0.051 0.053 0.057 0.043.83 0.069 0.058 0.051 0.044 0.05 4.00 0.06 0.058 0.054 0.056 0.055 6.3 0.054 0.056 0.048 0.048 0.057 1000 samples from IG 3 ( α, β = 1, γ = 0) 1.00 0.09 0.03 0.07 0.07 0.030 1.41 0.07 0.05 0.075 0.97 0.538.00 0.057 0.10 0.14 0.76 0.936.83 0.13 0.5 0.405 0.963 0.999 4.00 0.9 0.37 0.65 1.000 1.000 6.3 0.301 0.508 0.779 1.000 1.000 1000 samples from LN 3 ( ζ = 0, τ, γ = 0) 1.00 0.00 0.04 0.03 0.05 0.05 1.41 0.09 0.053 0.074 0.190 0.371.00 0.06 0.106 0.184 0.705 0.933.83 0.099 0.187 0.339 0.909 0.998 4.00 0.151 0.60 0.486 0.984 1.000 6.3 0.06 0.383 0.610 0.999 1.000 ΠΕΡΙΛΗΨΗ Οι οικογένειες της διπαραμετρικής και της τριπαραμετρικής κατανομής γάμα χρησιμοποιούνται ευρέως στην ανάλυση αξιοπιστίας και μακροβιότητας, τη στατιστική υδρολογία, την αντοχή υλικών, τη μετεωρολογία και άλλους επιστημονικούς κλάδους όπου υπάρχει ανάγκη περιγραφής δεδομένων με θετική λοξότητα. Στην εργασία αυτή εξετάζονται διαγράμματα ημιαναλλοίωτων για διερεύνηση της καταλληλότητας της τριπαραμετρικής γάμα κατανομής. Ένα από τα διαγράμματα εφοδιάζεται με ζώνη εμπιστοσύνης ταυτόχρονου επιπέδου 95%, που βασίζεται στην ασυμπτωτική συμπεριφορά της εμπειρικής ροπογεννήτριας - 467 -

συνάρτησης και σε αποτελέσματα από πεπερασμένα δείγματα. Προσομοιωμένα δείγματα από τις κατανομές γάμα, λογαριθμοκανονική και αντίστροφη κανονική χρησιμοποιούνται για τη εύρεση των ιδιοτήτων αυτού του διαγράμματος. Επίσης, γίνεται εφαρμογή σε πραγματικά δεδομένα και τα προτεινόμενα διαγράμματα συγκρίνονται με πιθανοτικά διαγράμματα. REFERENCES Chambers J.M., Cleveland W.S., Kleiner B. and Tukey P.A. (1983). Graphical Procedures for Data Analysis, Duxbury Press, Belmont. Κουτρουβέλης Ι.Α.(1994) Γραφικές μέθοδοι ανάλυσης μονοδιάστατων δεδομένων. Πρακτικά 7 ου Πανελλήνιου Συνεδρίου Στατιστικής (Λευκωσία), σελ. 150-157. Κουτρουβέλης Ι.Α.(006) Γραφικά για επιλογή κατάλληλου Προτύπου Κατανομής. Πρακτικά 19 ου Πανελλήνιου Συνεδρίου Στατιστικής (Καστοριά), υπό έκδοση. Koutrouvelis I.A. and Canavos G.C. (1999). Estimation in the Pearson type 3 distribution. Water Resources Research, 35, pp. 693-704. MINITAB, Release 14 (003). Minitab, Inc. Von Alven W. H. (1964). Reliability Engineering byarinc. Prentice-Hall, Inc., Englewood Cliffs. - 468 -