LIMITED DEPENDENT VARIABLES - BASIC [1] Binary choice models Moivaion: Dependen variable (y ) is a yes/no variable (eg, unionism, migraion, labor force paricipaion, or dealh ) (1) Linear Model (Somewha Defecive) Digression o Bernoulli's Disribuion: Y is a random variable wih pdf; p = Pr(Y=1) and (1-p) = Pr(Y=0) f(y) = p y (1-p) 1-y E(y) = Σ y yf(y) = 1p + 0 (1-p) = p; var(y) = Σ y y f(y) - [E(y)] = p - p = p(1-p) End of Digression Linear Model: y = x β + ε, where y = 1 if yes and y = 0 if no i Assume ha E( ε x ) = 0 i E( y x ) = x β p = Pr( y = 1 x ) p x j i i i = β : So, he coefficiens measure effecs of x j on p j Limied_Basic-1
Problems in he linear model: 1) The ε are nonnormal and heeroskedasic Noe ha y = 1 or 0 ε 1 = x i β wih prob = p = x β = x i β wih prob = 1 p = 1 x β E( ε x ) = (1 x x β + ( x (1 x = 0 var( ε x ) = E( ε x ) = (1 x x β + ( x (1 x ) β = x β(1 x No consan over OLS is unbiased bu no efficien GLS using ˆ σ ( ˆ = x ( 1 x ˆ i i is more efficien han OLS ) Suppose ha we wish o predic p o = P(y o = 1 x o ) a x o The naural predicor of p o is of he range (0,1) x ˆ β where ˆ β is OLS or GLS Bu o x ˆ β would be ouside o Limied_Basic-
() Probi Model Model: y = x i β + ε, = 1,, T, where y is a unobservable laen variable (eg, level of uiliy); y = 1 if y > 0 = 0 if y < 0; E( ε x ) = 0; ε ~ N (0,1) condiional on x ; E( ε x,, x, ε,, ε ) 0 i 1i 1 1 = ; he x are ergodic and saionary Digression o normal pdf and cdf X ~ N(μ,σ ): 1 ( x μ f( x) = exp πσ σ ), - < x < 1 z Z ~ N(0,1): φ( z) = exp π ; ( ) Pr( ) z Φ z = Z < z = φ ( v ) dv In GAUSS, φ(z) = pdfn(z) and Φ(z) = cdfn(z) Some useful facs: dφ(z)/dz = φ(z); dφ/dz = -zφ(z); Φ(-z) = 1 - Φ(z); φ(z) = φ(-z) End of digression Reurn o he Probi model Limied_Basic-3
PDF of he y : Condiional on x, Pr(y = 1) = Pr(y > 0) = Pr(x β + ε > 0) = Pr(x β > -ε ) = Pr(-ε < x = Φ(x This gauranees p Pr(y = 1) being in he range (0,1) y f( y x ) = ( Φ( x ) ( 1 Φ( x ) 1 y i Shor Digression i ε, = 1,, T, y = x β + (-ε ) are iid U(0,1) Then, Pr(y = 1 x ) = x β (linear) (Heckman and Snyder, Rand, 1997) End of Digression Log-likelihood Funcion of he Probi model T L T ( = Π 1 f( y x i ) = { } l T ( = Σ ln(f(y x )) = Σ y ln Φ ( x β ) + (1 y )ln ( 1 Φ ( x β ) Some useful facs: E(y x ) = Φ(x Φ( x β ) Φ( x x β = = φ( x x β x β β Φ( x Φ( x = = ( x βφ ) ( x x x β β β β i j k k Limied_Basic-4
lt ( β ) β = = ln Φ( x ln(1 Φ( x ) Σ y + (1 y) β β φ( x φ( x Σ y x + (1 y ) Φ( x 1 Φ( x x = Σ ( y Φ( x ) φ( x Φ( x (1 Φ( x ) x Numerical Propery of he MLE of β ( ˆ β ) l ( ˆ T β ) = β Σ ( y ( ˆ)) ( ˆ Φ x β φ x x Φ( x ˆ (1 Φ( x ˆ ) = 0 k 1 H T ( ˆ ˆ lt β ) ( β ) = β β should be negaive definie [See Judge, e al for he exac form of H T ] l T ( is globally concave wih respec o β; ha is, H T ( is negaive definie [Amemiya (1985, Advanced Economerics)] Use [ H ( ˆ β )] T 1 as Cov( ˆ β ) Limied_Basic-5
How o find MLE (See Greene Ch 5 or Hamilon, Ch 5) 1 Newon-Raphson s algorihm: STEP 1: Choose an iniial ˆo θ Then compue () ˆ ˆ ˆ 1 ˆ 1 o [ T( o)] T( o θ = θ + H θ s θ ) STEP : Using ˆ θ 1, compue ˆ θ by () STEP 3: Coninue unil ˆ θ ˆ q+ 1 θ q Noe: N-R mehod is he bes if l T (θ) is globally concave (ie, he Hessian marix is always negaive definie for any θ) N-R may no work, if l T (θ) is no globally concave BHHH [Bernd, Hall, Hall, Hausman] l T (θ) = Σ ln[f (θ)] Define: g (θ) = ln[ f ( θ )] θ [p 1] (s T (θ) = Σ g (θ)) BBT(θ) = Σ g (θ)g (θ) [cross produc of firs derivaives] Theorem: Under suiable regulariy condiions, 1 1 BT( ) p lim T E T( o) T θ T H θ Limied_Basic-6
Implicaion: B ( θ ) H ( θ ), as T T T Cov( θ ) can be esimaed by 1 [ B ( θ )] or 1 [ H ( θ )] BHHH algorihm uses ( ) 1 θ 1 = θ o + λ B ( θ o) s ( θ o), o T T where λ is called sep lengh When BHHH is used, no need o compue second derivaives Oher available algorihms: BFGS, BFGS-SC, DFP T T BHHH for Probi: Can show g ( = ξ x, where, ( y Φ) φ ξ = ; φ = φ( x ; Φ =Φ( x Φ (1 Φ ) ˆ ˆ BT( =Σ gˆˆ g =Σ ξ x x [B T ( ˆ β )] -1 is Cov( ˆ β ) by BHHH Inerpreaion of β 1) β j shows direcion of influence of x j on Pr( y x ) = Φ ( x β ) β j > 0 means ha Pr( y = 1 ) increases wih x j ) Rae of change: x Pr( y = 1 x ) Φ( x β ) = = φ( x β j x x j j Limied_Basic-7
Esimaion of probabiliies and raes of changes Esimaion of p = Pr(y =1 x ) a mean of x Use pˆ =Φ ( x ˆ β ) ˆ ( ˆ ) var( p) = φ( x x Ωx ˆ where Ω= ˆ Cov( ˆ β ) [by dela-mehod] Esimaion of raes of change Use ( ˆ j Φ x β ) pˆ = = φ( x ˆ ˆ β j x j j ˆ j ( ) ( ˆ j p β ) var( ˆ ) ˆ p β p = Ω [by dela-mehod] β β Noe ha: j p ( β ) β = ( x βφ ) ( x ββ ) x + φ( x J, where J j = 1 k vecor of zeros excep ha he j h elemen = 1 j j Noe on normalizaion: Model: y = x β + ε, -ε ~ N(0,σ ) y = 1 iff y > 0 p = Pr( y = 1 x ) = Pr( y > 0 x ) = Pr( x β + ε > 0 x ) = Pr( ε < x β x ) = Pr( ε / σ < x ( β / σ) x ) =Φ[ x ( β / σ)] Can esimae β/σ, bu no β and σ separaely Limied_Basic-8
Tesing Hypohesis: 1 Wald es: H o : w( = 0 1 W T = w( ˆ β ) W( ˆ ˆ W( ˆ w( ˆ Ω β ) d χ (df = # of resricions), LR es: where ˆ β = probi MLE and W( = w( β ) β Easy for equaliy or zero resricions (ie, H o : β = β 3, or H o : β = β 3 = 0) EX 1: Suppose you wish o es H o : β 4 = β 5 = 0 STEP 1: Do Probi wihou resricion and ge l T,UR = ln(l T,UR ) STEP : Do Probi wih he resricrions and ge l T,R = ln(l T,R ) Probi wihou x 4 and x 5 STEP 3: LR T = [l T,UR - l T,R )] χ (df = ) EX : Suppose you wish o es H o : β = = β k = 0 (Overall significance es) Le n = Σ y l T = n ln(n/t) + (T-n) ln[(t-n)/t] LR T = [l T,UR l T ] p χ (k-1) Limied_Basic-9
Pseudo-R (McFadden, 1974) ρ = 1 l T,UR /l T 0 ρ 1 If Φ x ˆ) β = 1 whenever y = 1, and if Φ x ˆ β ) = 0 whenever y = 0, ρ ( ( = 1 If 0 < ρ < 1, no clear meaning (3) Logi Models Model: y = xi β + ε, ε ~ logisic wih g(ε) = e ε /(1+e ε ) and G(ε) = e ε /(1+e ε ) Use Pr( y = 1 x ) = G( x β ) (insead of Φ ( x β ) Logi MLE ˆ β log max Use i { ( β ) ( β )} ln( L ) =Σ y ln G( x ) + (1 y )ln 1 G( x ) [ ( ˆ )] 1 HT βlog i or T [ B ( ˆ β )] T log i 1 as Cov( βlog i ) ˆ p = gx ( β ) β j x j Limied_Basic-10
Facs: The logisic dis is quie similar o sandard normal dis excep ha he logisic dis has hicker ails (similarly o (7)) If daa conain few obs wih y = 1 or y = 0, hen probi and logi may be quie differen Oher han ha, probi and logi yield very similar predicions Especially, marginal effecs are quie similar Roughly, ˆ β = 16 ˆ β log i probi Limied_Basic-11
[] Censoring vs Truncaion (Greene, ch 0) (1) Classical disincion Consider shos on arge Truncaion: cases where you have daa on hole only Censoring: cases where you know how many shos missed () Censoring y ~ pdf: f(y ) Observe y = y if A < y < B ; A if y A ; B if y B (For obs wih y = A or y = B, y is unknown) Log-likelihood funcion: OB = { y observed}; NOB = { y unobserved}, ( ) l =Σ ln ( ) Pr( ) ln Pr( ) T OB f y OB OB +Σ NOB NOB Noe: f(y OB)Pr( OB) = f(y A < y < B)Pr(A < y < B) = [f(y )/Pr(A < y < B)]Pr(A < y < B) = f(y ) lt = ln ( f( y )) + ln( Pr( y A) ) + ln ( Pr( y B) ) A< y < B y = A y = B lt = ln ( f( y) ) + ln ( Pr( y A) ) + ln ( Pr( y B) ) A< y < B y = A y = B Limied_Basic-1
(3) Truncaion Observe y = y iff A < y < B Log-likelihood funcion: pdf of y : g(y ) = f(y A < y < B) = f y Pr( A < y < B) Pr( A< y < B) ( ) f( y) = lt =Σ{ ln( f( y)) ln[pr( A< y <B) } (4) Tobi (A censored model) 1) Laen model: y = x i β + ε, ε ~ N(0,σ ) condiional on x [y ~ N(x β, σ )] ) 3 possible cases: A Observe y = y if y > 0; = 0, oherwise y = max(0, y ) B Observe y = y if y < 0; = 0 oherwise y = min(0, y ) C Observe y = y if y < L ; = L oherwise Limied_Basic-13
3) Log-likelihood for A Pr(y 0 x ) = Pr(x β + ε 0) = Pr(ε -x = Pr(ε /σ -x (β/σ)) f(y ) = Therefore, = Φ[-x (β/σ)] = 1 - Φ[x (β/σ)] 1 ( y exp πσ y > 0 x β ) σ T( βσ, ) = ln f( y) + ln 1 Φ y > 0 y = 0 σ l x β = ln f( y ) + ln 1 Φ y > 0 y = 0 σ { 1 1 } ln( π ) ln( σ) ( y x = σ x β + ln 1 Φ y = 0 σ x β Limied_Basic-14
5) Inerpreaion: (i) E(y x ) = E[laen var (eg, desired consumpion) x ] = x β β j = E y x ( j x ) (ii) E(y x ) = E[observed variable (eg, acual expendiure)] = Pr(y 0)E(y y 0) + Pr(y < 0)E(y y < 0) = Pr(y 0)E(y y 0) + Pr(y < 0)E(0 y < 0) = Φ(x β/σ)e(x β + ε ε -x = Φ(x β/σ)[x β + σλ(x β/σ)] [where λ(x β/σ) = φ(x β/σ)/φ(x β/σ) (inverse Mill s raio)] = Φ(x β/σ)x β + σφ(x β/σ) Shor Digression: Suppose ha ε ~ N(0,σ ) Then, φ( h / σ ) E( ε ε > h) = σ Φ( h / σ ) End of Digression Noe: Condiional on x, E( y ) x β β j x x β β β β j = φ ( x +Φ β j + σ x φ x j σ σ σ σ i σ σ x β =Φ β j σ 6) Esimaion of E(y x) and E(y x) Limied_Basic-15
Le g 1 ( = x β Esimaed E(y ) a sample mean = g ˆ 1( β ) se= GΩ ˆ G, where ˆ ( ˆ g1( β ) Ω= Cov β ) and G1 = = β 1 1 x Le g (β,σ) = x β x β Φ x β + σφ σ σ Esimaed E(y ) a sample mean = g ˆ ( β, ˆ σ ) g x β x β G (β,σ) = = Φ x, φ ( β, σ) σ σ se = Gˆ ˆ Ω G, where ˆ Cov β Ω= σˆ Limied_Basic-16
(5) Truncaion (Maddala, Ch 6) 1) Example 1: Earnings funcion from a sample of poor people (Hausman and Wise, ECON 1979): y = x i β + ε, ε ~ N(0,σ ) condiional on x Observe y = y iff y < L (L = 15 povery line dep on family size) Log-likelihood funcion: pdf of y : g(y x ) = f(y y L, x ) = f y Pr( y L x ) ( x ) Pr(y L ) = ε L x β L x β y L x = x =Φ σ σ σ Pr(, ) Pr L x β ln L = Σ ln( f( y x )) ln Φ σ where f is he normal densiy funcion E y x E y y L ( ) = ( < ) = Ex ( β + ε x β + ε < L) = x β + E( ε ε > ( L x ) = x β E( ε ε > ( L x ) x β L σλ x β σ = Limied_Basic-17
) Example : Observe y = y iff y > L f(y y L, x ) = f(y x )/Pr(y L x ) Pr(y L x β L x ) = 1 - Φ σ L x β lt =Σ ln( f( y x )) ln 1 Φ σ E(y x ) = E(y y L x β L, x ) = x β + σλ σ 3) Link beween Truncaion and Tobi Suppose L = 0 for all in Example, L x β x β x β 1 Φ = 1 Φ =Φ σ σ σ Then, he log-likelihood funcion becomes: 1 1 x β lt =Σ ln( π) ln( σ) ( y ) ln x β Φ σ σ Consider obi Choose observaions wih y > 0 and do runcaion MLE This is he case where we observe y = y iff y > L = 0 The runcaion MLE using he runcaed daa is consisen even if i is inefficien If he esimaion resuls from runcaion and obi MLE are quie differen, i means ha he obi model is no correcly specified Limied_Basic-18
(6) Two-par Model Cragg (ECON, 1971), Lin and Schmid (Review of Economics and Saisics (RESTAT), 1984) 1) Model: y = x β + ε i, where g(ε x,z,v ) = h = z γ + v wih v ~ N(0,1); h = 1 iff h > 0; = 0, oherwise 1 1 exp ε πσ σ and ε > - x x β; β Φ σ 3) Example: y : desired spending on clohing; h : iming o buy Limied_Basic-19
4) Log-likelihood funcion: Noe: g(y x ) = 1 1 exp ( y ) x β πσ σ x β Φ σ ln[ g( y h > 0) Pr( h > 0)] + ln[pr( h < 0)] h = 1 h = 0 g(y h > 0) = g(y ), because ε and v are so indep l T 1 1 ln( π) ln( σ) ( y ) x iβ σ = + x β + ln Φ( z γ ) ln Φ σ 1 1 ln( π) ln( σ) ( y ) x iβ σ = h = 1 x iβ ln Φ σ h = 1 i h = 0 i ( z γ ) ( z γ ) + ln Φ ( ) + ln 1 Φ( ) i h = 1 h = 0 runc for y > 0 + probi for all obs Esimae (β,σ) by runc and γ by probi l Cragg = l runc + l probi i ; Pr(h > 0) = Φ(z γ) [ Φ z γ ] ln 1 ( ) i Limied_Basic-0
Noe: Le z = x If γ = β/σ, Cragg becomes obi!!! 5) LR es for obi specificaion STEP 1: Do obi and ge l obi STEP : Do runc using observaions wih y > 0 and ge l runc STEP 3: Do probi using all observaions, and ge l probi STEP 4: l cragg = l runc + l probi STEP 5: LR = [l cragg - l obi ] d χ (k) Limied_Basic-1
[3] Selecion Model Heckman, ECON, 1979 Moivaion: Model of ineres: y 1 = x 1 β 1 + ε 1 Observe y 1 (or/and x 1, ) under a cerain condiion ( selecion rule ) Example: Observe a woman s marke wage if she works Complee Model: y 1 = x 1 β 1 + ε 1, y = x β + ε y = 1 if y > 0; = 0 if y < 0 We observe y 1 iff y = 1 (x mus be observable for any ) Assumpions: Condiional on ( x 1, x ), ε1 0 σ1 σ1 ~ N, ε 0 σ1 σ Limied_Basic-
Theorem: Suppose: h1 0 σ1 σ1 ~ N, h 0 σ1 σ φ( a) Then, E(h 1 h > -a) = σ1 Φ ( a) Facs: Condiional on ( x 1, x ) E( ε y > 0) = E( ε ε > x β ) = σ λ( x β ), 1 1 1 φ( x where λ( x = Φ( x β ) λ [inverse Mill s raio] E( y y > 0) = x β + E( ε ε > x β ) = x β + σ λ( x β ) 1 1 1 1 1 1 1 y = x + σ λ + v, 1 1β1 1 where Ev ( ε > x = 0; var( > ) ξ ; v ε xβ σ1 ξ = σ [( x β ) λ + λ ] 1 Two-Sep Esimaion: STEP 1: Do probi for all, and ge ˆ ( ˆ ) β, and ˆ φ x β λ = Φ( x ˆ β ) STEP : Do OLS on y ˆ 1 = x 1β1+ σ1λ + η, and ge ˆ β 1 and ˆ σ 1 Limied_Basic-3
Facs on he Two-Sep Esimaor: Consisen -es for H o : σ 1 = 0 (no selecion) in STEP is he LM es (Melino, Review of Economic Sudies, 198) Bu all oher -ess are wrong!!! s (XX) -1 is inconsisen So, have o compue correced covariance marix [See, Heckman (1979, Econ), Greene (1981, Econ)] Someimes, correced covariance marix is no compuable (Greene, Econ, 1981) Covariance Marix of he Two-Sep Esimaor: Le Ω= ˆ Cov( ˆ y1 = x1 β1+ σ1λ + v y = x β + σ ˆ λ + [ σ ( ˆ λ λ ) + v ] 1 1 1 1 1 Shor Digression: By Taylor expansion around he rue value of β, ˆ ˆ λ( x β ) λ = λ( x β ) λ( x β ) + ( ˆ β β ) End of Digression β Limied_Basic-4
( ˆ β1 y1 1 ) ˆ ˆ = x λ + h ( β + v = z γ + [ h ( β + v], σ 1 where h = σ [( x β ) λ + λ ] x 1 In marix noaion, y 1 = Zγ + [H( ˆ β β ) + v] ˆ γ ( ) 1 TS = ZZ Zy1 = ZZ Z Zγ + H ˆ β β + v 1 ( ) ( ( ) ) = γ + ( Z Z) ZH ( ˆ β β ) + ( ZZ ) Z v 1 1 Can show ha (βˆ and v are uncorrelaed Then, inuiively, Cov ˆ γ ) = Cov[(ZZ) -1 ZH( ˆ β β ) + (ZZ) -1 Zv] ( TS = Cov[(ZZ) -1 ZH( ˆ β β )] + Cov[(ZZ) -1 Zv] = (ZZ) -1 ZHCov( ˆ β HZ(ZZ) -1 + (ZZ) -1 ZCov(v)Z(ZZ) -1 = (ZZ) -1 ZHΩHZ(ZZ) -1 + (ZZ) -1 ZΠZ(ZZ) -1, where Π = diag( π1,, π T ) (ZZ) -1 Z HˆΩ ˆ H ˆ Z(ZZ) -1 + (ZZ) -1 Z Π Z(ZZ) -1, where esimaed H 1 Π= diag( vˆ,, vˆ ) and Ĥ is an T Limied_Basic-5
MLE (which is more efficien han wo-sep esimaor) Condiional on ( x 1, x ) Pr(y 1 is no observed) = Pr(y < 0) = Pr( y < 0) = 1 Φ ( x β ) Pr(y 1 is observed) = Pr( y > 0) =Φ ( x f(y 1 y 1 is observed) = f(y 1 y 0) = 1 ( y1 x 1β1) exp πσ σ 1 1 1 1 1 1 1 1 Φ σ1 σ1 σ x β + ( σ / σ )( y x β ) Φ( x β ) l T = y 1 observed ln[ f ( y y is observed) Pr( y is observed) 1 1 1 + ln Pr( y is no observed ) y 1 is no observed 1 Limied_Basic-6