A Note on Deriving the Pareto/NBD Model and Related Expreion Peter S. Fader www.petefader.com Bruce G. S. Hardie www.brucehardie.com November 25 Introduction The Pareto/NBD model wa developed by Schmittlein et al. 987, hereafter SMC, to decribe repeat-buying behavior in a noncontractual etting. They derive expreion for, amongt other thing, i the probability that a cutomer with a given tranaction hitory i till alive, and ii the expected number of future tranaction for a randomly-choen cutomer, conditional on hi tranaction hitory. Many reader of SMC find the derivation preented in the paper to be rather daunting. The objective of thi note i to guide the reader through the derivation of the key reult and to preent ome new related reult. In many cae, our approach to deriving a pecific expreion differ from that ued by SMC; our reaon for taking an alternative derivation route i that we feel it i impler to follow. In Section 2 we outline the aumption of the Pareto/NBD model and derive a key intermediate reult. In Section 3 6, we derive expreion for the model likelihood function omething not preented in SMC, the mean and variance, the probability that a cutomer i alive, and the conditional expectation. But before we tart, let u introduce the Gauian hypergeometric function, which i the power erie of the form 2F a, b; c; j a j b j c j j, c,, 2,..., j! where a j i Pochhammer ymbol, which denote the acending factorial aa + a + j. The erie converge for < and i divergent for > ; if, the erie converge for c a b>. Since an acending factorial can be repreented a a ratio of two gamma function, a j Γa + j Γa c 25 Peter S. Fader and Bruce G. S. Hardie. Document ource: <http://brucehardie.com/note/9/>. The tandard reference i the Handbook of Mathematical Function, edited by Abramowit and Stegun 972; the intereted reader i directed to thi book for further information on thi function. Additional information can be found in Gradhteyn and Ryhik 994 and Andrew, Akey, and Roy 999. An excellent online reference i the Wolfram function ite http://function.wolfram.com/.,
we can write the Gauian hypergeometric function a 2F a, b; c; Γc ΓaΓb j Γa + jγb + j j Γc + j j!. Looking at thi, it hould be clear that the function i ymmetric in the upper parameter a and b, i.e., 2 F a, b; c; 2 F b, a; c;. The reader hould keep thi in mind when working through the derivation in thi note. Euler integral repreentation of the Gauian hypergeometric function i 2F a, b; c; Bb, c b t b t c b t a dt, c>b, where B, i the beta function. While the ymmetry of the Gauian hypergeometric function in the parameter a and b i not obviou in thi integral, be aured that it doe hold. 2 Model Aumption The Pareto/NBD model i baed on the following aumption: i. Cutomer go through two tage in their lifetime with a pecific firm: they are alive for ome period of time, then become permanently inactive. ii. While alive, the number of tranaction made by a cutomer follow a Poion proce with tranaction rate λ; therefore the probability of oberving x tranaction in the time interval,t] i given by P Xt x λ λtx e λt, x,, 2,... x! Thi i equivalent to auming that the time between tranaction i ditributed exponential with tranaction rate λ, where t j i the time of the jth purchae. ft j t j λ λe λtj tj, t j >t j >, iii. A cutomer unoberved lifetime of length τ after which he i viewed a being inactive i exponentially ditributed with dropout rate μ: fτ μ μe μτ. iv. Heterogeneity in tranaction rate acro cutomer follow a gamma ditribution with hape parameter r and cale parameter α: gλ r, α αr λ r e λα Γr. 2 v. Heterogeneity in dropout rate acro cutomer follow a gamma ditribution with hape parameter and cale parameter β. gμ, β β μ e μβ Γ. 3 vi. The tranaction rate λ and the dropout rate μ vary independently acro cutomer. 2
Aumption ii and iv give u the NBD model for the ditribution of the number of tranaction while the cutomer i alive, P Xt x r, α Γr + x Γr x! P Xt x λ gλ r, α dλ r x α t, 4 α + t α + t while aumption iii and v give u Pareto ditribution of the econd kind, fτ, β fτ μ gμ, β dμ + β, and 5 β β + τ F τ, β F τ μ gμ, β dμ β. 6 β + τ The NBD and Pareto label for each of the ub-model naturally lead to the name of the integrated model. 2. A Key Intermediate Reult A we proceed with the derivation, we will need evaluate a double integral of the following form a number of time: λ γ μ δ A λ + μ e λ+μt gλ r, αgμ, β dλ dμ. 7 μ λ+μ Let u conider the tranformation p and λ + μ, with correponding invere tranformation λ p and μ p. The Jacobian of thi tranformation i λ λ p J μ μ. It follow that A where the joint ditribution of p and gp, α, β, r, p p δ p γ γ+δ e t gp, α, β, r, d dp, 8 αr β ΓrΓ p p r r+ e α p 9 i derived uing the tranformation technique Caella and Berger 22, Section 4.3, pp. 56 62; Mood et al. 974, Section 6.2, p. 24ff. Subtituting 9 in 8 give u A αr β ΓrΓ B 3
where B p +δ p r+γ r++γ+δ 2 e α+t p d dp p +δ p r+γ { Γr + + γ + δ Γr + + γ + δ α + t r++γ+δ } r++γ+δ 2 e α+t p d dp p +δ p r+γ α + t α βp r++γ+δ dp p +δ p r+γ [ α+t p ] r++γ+δ dp which recalling Euler integral for the Gauian hypergeometric function,, Γr + + γ + δ α + t r++γ+δ Γr + γγ + δ Γr + + γ + δ 2 F r + + γ + δ,+ δ; r + + γ + δ; α+t, and therefore α r β Γr + γγ + δ A α + t r++γ+δ ΓrΓ r + + γ + δ 2 F r + + γ + δ,+ δ; r + + γ + δ; α+t. Looking cloely at, we ee that the argument of the Gauian hypergeometric function, α+t, i guaranteed to be bounded between and when α β ince β>and t>, thu enuring convergence of the erie repreentation of the function. However, when α<βwe can be faced with the ituation where >, in which cae the erie i divergent. Therefore, for the cae of α β, let u conider the tranformation p λ λ+μ and λ+μ, with correponding invere tranformation λ p and μ p. The Jacobian of thi tranformation i λ λ p J μ μ. p It follow that A p γ p δ γ+δ e t gp, α, β, r, d dp, where the joint ditribution of p and gp, α, β, r, αr β ΓrΓ pr p r+ e β p i derived uing the tranformation technique. Thi give u where B A αr β ΓrΓ B p r+γ p +δ r++γ+δ 2 e β+t p d dp p r+γ p +δ { Γr + + γ + δ Γr + + γ + δ β + t r++γ+δ } r++γ+δ 2 e β+t p d dp p r+γ p +δ β + t β αp r++γ+δ dp p r+γ p +δ [ ] r++γ+δ β+t p dp 4
which recalling Euler integral for the Gauian hypergeometric function,, and therefore Γr + + γ + δ Γr + γγ + δ β + t r++γ+δ Γr + + γ + δ 2 F r + + γ + δ,r+ γ; r + + γ + δ; β+t, A α r β Γr + γγ + δ β + t r++γ+δ ΓrΓ r + + γ + δ. 2 F r + + γ + δ,r+ γ; r + + γ + δ; β+t We ee that the argument of the above Gauian hypergeometric function i bounded between and when α β. We therefore preent and a olution to 7: we ue when α β and when α β. 3 Deriving the Model Likelihood Function 3. Deriving the Likelihood Function Conditional on λ and μ Let u aume we know when each of a cutomer x tranaction occurred during the period,t]; we denote thee time by t,t 2,...,t x : t t 2 t x T There are two poible way thi pattern of tranaction could arie: i. The cutomer i till alive at the end of the obervation period i.e., τ>t, in which cae the individual-level likelihood function i imply the product of the inter-tranaction-time exponential denity function and the aociated urvivor function: Lλ t,...,t x,t,τ >Tλe λt λe λt2 t λe λtx tx λt tx e λ x e λt. ii. The cutomer became inactive at ome time τ in the interval t x,t], in which cae the individual-level likelihood function i Lλ t,...,t x,t,inactive at τ t x,t] λe λt λe λt2 t λe λtx tx e λτ tx λ x e λτ. Note that in both cae, information on when each of the x tranaction occurred i not required; we can replace t,...,t x,t with x, t x,t where, by definition, t x when x. In other word, t x and x are ufficient ummarie of a cutomer tranaction hitory. Uing direct marketing terminology, t x i recency and x i frequency. Removing the conditioning on τ yield the following expreion for the individual-level likelihood function: Lλ, μ x, t x,tlλ x, T, τ > T P τ >T μ + T t x Lλ x, T, inactive at τ t x,t]fτ μ dτ 2 T λ x e λt e μt + λ x e λτ μe μτ dτ λ x e λ+μt + λx μ λx μ λ + μ e λ+μtx λ + μ e λ+μt 3 λx μ + λx+ λ + μ e λ+μtx λ + μ e λ+μt. 4 Thi i a new reult, a SMC do not preent an explicit expreion for the model likelihood function. t x 5
3.2 Removing the Conditioning on λ and μ We remove the conditioning on λ and μ by taking the expectation of Lλ, μ x, t x,t over the ditribution of λ and μ: Lr, α,, β x, t x,t Lλ, μ x, t x,tgλ r, αgμ, βdλdμ. 5 Subtituting 3 in 5 give u Lr, α,, β x, t x,ta + A 2 A 3 6 where A λ x e λ+μt gλ r, αgμ, βdλdμ 7 and A 2 λ x μe λ+μtx gλ r, αgμ, βdλdμ λ + μ and A 3 λ x μe λ+μt gλ r, αgμ, βdλdμ λ + μ Let u firt conider A : ubtituting 2 and 3 in 7, we have A λ x e λ+μt αr λ r e λα β μ e μβ dλdμ Γr Γ Γr + xα r β Γrα + T r+x β + T Looking cloely at the expreion for A 2 and A 3, we ee that they have the ame form a that given in 7 with γ x and δ, and t t x and t T repectively. Recalling the olution given in and, it follow that for α β, A 2 A 3 while for α β, A 2 A 3 α r β Γr + γ α + t x r++x Γr α r β Γr + x α + T r++x Γr α r β Γr + x β + t x r++x Γr α r β Γr + x β + T r++x Γr 2F, +; +; 2F, +; +;, 2F, r + x; +; β+t x 2F, r + x; +;. Subtituting thee expreion for A A 3 into 6 and implifying give u the following expreion for the likelihood function for a randomly-choen individual with purchae hitory x, t x,t: Lr, α,, β x, t x,t Γr + xαr β { } Γr α + T r+x β + T + A 8 6
where for α β while for α β A 2 F, +; +; α + t x r++x 2 F, +; +; α + T r++x 9 A 2 F, r + x; +; β+t x β + t x r++x 2 F, r + x; +;. 2 β + T r++x Thi expreion for the model likelihood function i that ued in Fader et al. 25b. The four Pareto/NBD model parameter r, α,, β can be etimated via the method of maximum likelihood in the following manner. Suppoe we have a ample of N cutomer, where cutomer i had x i tranaction in the period,t i ], with the lat tranaction occurring at t xi. The ample log-likelihood function i given by LLr, α,, β N ln [ Lr, α,, β x i,t xi,t i ]. i Thi can be maximied uing tandard numerical optimiation routine. See Fader et al. 25a for detail of a MATLAB-baed implementation. A variant on the above derivation follow by changing the order of integration: we firt integrate 2 over the ditribution of λ and μ and then remove the conditioning on τ. Any reader who ha followed our working o far will realie that thi give u T Lr, α,, β x, t x,tlr, α x, T P τ >T, β+ Γr + x Γr + T α r α + T Γr + x t x Γr [ Γr + xαr β Γr α α + τ t x Lr, α x, τ fτ, β dτ x β α + T β + T r x + β dτ α + τ β β + τ ] α + T r+x β + T + C 2 where C T t x α + τ r+x β + τ + dτ. i. For α β, we make the change of variable y α + τ, giving u C α+t x y r+x β α + y + dy y r+x β α + y + dy y r+x β α + y + dy 7
letting α + t x /y in the firt integral which implie dy dα + t x 2 and α + T /y in the econd integral which implie dy dα + T 2, r+x α + tx α + tx 2 r+x α + T α + T + 2 α + t x r++x α + T r++x [ β α + α + t x [ β α + α + T r++x [ ] +d ] + d ] + d r++x [ ] +d which recalling Euler integral for the Gauian hypergeometric function,, { 2F +,r+ + x; +; α + t x r++x 2 F +,r+ + x; +; α + T r++x }. Subtituting thi expreion for C in 2, and recalling the ymmetry of Gauian hypergeometric function in it upper parameter i.e., 2 F a, b; c; 2 F b, a; c;, yield 8 and 9. ii. For α β, we make the change of variable y β + τ, giving u C β+t x y + α β + y r+x dy y + α β + y r+x dy y + α β + y r+x dy letting β + t x /y in the firt integral which implie dy dβ + t x 2 and β + T /y in the econd integral which implie dy dβ + T 2, + [ β + tx β + tx 2 α β + β + t ] r+x x d + [ β + T β + T + 2 α β + β + T ] r+x d β + t x r++x r++x [ ] r+xd β+t x β + T r++x [ r++x [ ] r+xd β + t x r++x 2 F r + x, ; +; β+t x β + T r++x 2 F r + x, ; +; ]. Subtituting thi expreion for C in 2, and recalling the ymmetry of Gauian hypergeometric function in it upper parameter, yield 8 and 2. 8
3.3 An Alternative Expreion for the Model Likelihood Function An alternative expreion for the model likelihood function can be obtained by ubtituting 4, intead of 3, in 5, giving u where A 2 i defined a above and A 4 Lr, α,, β x, t x,ta 2 + A 4 λ x+ e λ+μt gλ r, αgμ, βdλdμ λ + μ Looking cloely at thi expreion for A 4, we ee that it ha the ame form a that given in 7 with γ x +, δ, and t T. Recalling the olution given in and, it follow that for α β, α r β Γr + x r + x A 4 α + T r++x 2F, ; +; Γr, while for α β, A 4 α r β Γr + x β + T r++x Γr Therefore, for α β, Lr, α,, β x, t x,t Γr + xαr β { Γr r + x + r + x 2F, r + x +; +;. while for α β, Lr, α,, β x, t x,t Γr + xαr β { Γr r + x + 2F, +; +; α + t x r++x 2F, ; +; α + T r++x }, 22 2 F, r + x; +; β+t x β + t x r++x 2 F, r + x +; +; β + T r++x }. 23 Are 8 2 equivalent to 22 and 23? While the indirect equivalence i obviou given the equivalence of 3 and 4, the direct equivalence i not immediately obviou. Direct equivalence implie that for α β, α + T β + T while for α β, r+x β + T α + T 2F, +; +; r + x 2F, r + x; +; r + x 2F, ; +; 2F, r + x +; +; Looking at the o-called Gau relation for contiguou function, we have the following reult Abramowit and Stegun 972, equation 5.2.24: c b 2 F a, b; c; +b 2 F a, b +;c; c 2 F a, b; c ;. 24 25 9
i. For the cae of α β, let a, b and c +. Thi give u r + x 2 F, ; +; + 2 F, +; +; 2 F, ; ; Noting that 2 F a, b; b; a, 2F, ; ; α + T β + T and therefore r + x 2 F, ; +; + 2 F, +; +; α + T, β + T which i clearly equivalent to 24. ii. For the cae of α β, let a, b r + x and c +. Thi give u 2 F, r + x; +; +r + x 2 F, r + x +; +; 2 F, r + x; ; The reult that 2 F a, b; b; a implie that and therefore r+x 2F, r + x; ; β + T α + T 2 F, r + x; +; β + T α + T which i clearly equivalent to 25. r+x, +r + x 2 F, r + x +; +; 4 Mean and Variance of the Pareto/NBD Model Given that the number of tranaction follow a Poion proce while the cutomer i alive, i. if τ, the unoberved time at which the cutomer become inactive, i greater than t, the expected number of tranaction i imply λt. ii. if τ t, the expected number of tranaction in the interval,τ]iλτ. Removing the conditioning on the time at which the cutomer become inactive, it follow that the expected number of tranaction in the time interval,t], conditional on λ and μ, i E[Xt λ, μ] λtp τ >t μ+ λte μt + λ λte μt + λ μ t t t μτe μτ dτ λτfτ μ dτ μ 2 τe μτ dτ
which, noting that the integrand i an Erlang-2 pdf, λte μt + λ { e μt μte μt} μ λ μ λ μ e μt. 26 To arrive at an expreion for E[Xt] for a randomly-choen individual, we take the expectation of 26 over the ditribution of λ and μ: E[Xt r, α,, β] E[Xt λ, μ]gλ r, αgμ, β dλdμ rβ α rβ α β + t [ ] rβ β, 27 α β + t which i the expreion reported in SMC, equation 7. In order to derive the variance of the Pareto/NBD, we recall the defining relationhip for the variance of a random variable: varx EX 2 EX 2. 28 Having derived an expreion for EX, we now need to derive an expreion for EX 2. Given that the number of tranaction follow a Poion proce while the cutomer i alive, it follow that E[Xt 2 λ] λt +λt 2 if τ>t, and E[Xτ 2 λ] λτ +λτ 2 if τ t. Removing the conditioning on the time at which the cutomer become inactive, we have E[Xt 2 λ, μ] {λt +λt 2 }P τ >t μ+ t {λτ +λτ 2 }fτ μ dτ t E[Xt λ, μ]+λt 2 e μt + λ 2 μτ 2 e μτ dτ E[Xt λ, μ]+λt 2 e μt + 2λ2 μ 2 t μ 3 τ 2 e μτ dτ 2 which, noting that the integrand i an Erlang-3 pdf, E[Xt λ, μ]+λt 2 e μt + { 2λ2 μ 2 e μt μte μt μt2 e μt } 2 { λ μ } { } μ μ e μt +2λ 2 2 e μt μ 2 te μt. 29 μ To arrive at an expreion for E[Xt 2 ] for a randomly-choen individual, we take the expectation of 29 over the ditribution of λ and μ: E[Xt 2 r, α,, β] rβ α + E[Xt 2 λ, μ]gλ r, αgμ, β dλdμ [ ] β β + t [ 2rr +β β α 2 2 β 2 ] β β t 2 β + t β + t 3 Our expreion for var[xt r, α,, β] i obtained by ubtituting 27 and 3 in 28; thi i equivalent to the expreion reported in SMC, equation 9.
5 Derivation of P alive x, t x,t The probability that a cutomer with purchae hitory x, t x,t i alive at time T i the probability that the unoberved time at which he become inactive τ occur after T, P τ > T. Referring back to our derivation of the individual-level likelihood function i.e., 2, the application of Baye theorem give u Lλ x, T, τ > T P τ >T μ P τ >T λ, μ, x, t x,t Lλ, μ x, t x,t λx e λ+μt Lλ, μ x, t x,t. 3 Subtituting 3 in 3, we have P τ >T λ, μ, x, t x,t λ x e λ+μt λ x e λ+μt + μλx λ+μ e λ+μtx μλx λ+μ e λ+μt λ x e λ+μt { λ x e λ+μt + [ μ λ+μ e λ+μt x T ]} + [ μ λ+μ e λ+μt t x ] which i the expreion reported in SMC, equation A. A the tranaction rate λ and death rate μ are unoberved, we compute P alive x, t x,t for a randomly-choen individual by taking the expectation of 3 over the ditribution of λ and μ, updated to take account of the information x, t x,t: P alive r, α,, β, x, t x,t P τ >T λ, μ, x, t x,tgλ, μ r, α,, β, x, t x,t dλdμ 32 By Baye theorem, the joint poterior ditribution of λ and μ i gλ, μ r, α,, β, x, t x,t Lλ, μ x, t x,tgλ r, αgμ, β. 33 Lr, α,, β x, t x,t Subtituting 3 and 33 in 32, we get P alive r,α,, β, x, t x,t / λ x e λ+μt gλ r, αgμ, β dλdμ Lr, α,, β x, t x,t Subtituting 8 on 34 give u Γr + xα r β / Γrα + T r+x β + T Lr, α,, β x, t x,t. 34 P alive r, α,, β, x, t x,t { + α + T r+x β + T A }. 35 where A i defined in 9 and 2. An alternative derivation follow from the derivation of the Pareto/NBD likelihood function given in 2. Applying Baye theorem, Lr, α x, T P τ >T, β P τ >T r, α,, β, x, t x,t Lr, α,, β x, t x,t r x Γr + x α β / Lr, α,, β x, t x,t, Γr α + T α + T β + T which i the expreion given in 34. 2
5. Equivalence with SMC Expreion i. Subtituting 9 in 35, it follow that for α β, { P alive r, α,, β, x, t x,t + r + x + [ r+x α + T β + T α + t x α + t x β + T α + T 2F, +; +; ]} 2F, +; +; 36 which i the expreion reported in SMC, equation. Note the error in SMC, equation A25. ii. Subtituting 2 in 35, it follow that for α β, { P alive r, α,, β, x, t x,t + r + x + [ r+x α + T β + T β + t x β + t x α + T β + T r+x 2F, r + x; +; β+t x ]} 2F, r + x; +; 37 which i the expreion reported in SMC, equation 2. iii. Noting that 2 F a, b; c;forc>b, 36 and 37 reduce to { [ r+x+ α + T P alive r, α,, β, x, t x,t + ]} α + t x when α β, which i the expreion reported in SMC, equation 3. 6 Derivation of the Conditional Expectation Let the random variable Y t denote the number of purchae made in the period T,T +t]. We are intereted in computing EY t x, t x,t, the expected number of purchae in the period T,T +t] for a cutomer with purchae hitory x, t x,t; we call thi the conditional expectation. If the cutomer i active at T, it follow from our derivation of an expreion for E[Xt] that E[Y t λ, μ, alive at T ]λtp τ >T+ t μ, τ > T + T +t T λτfτ μ, τ > T dτ which, given the memoryle property of the exponential ditribution aociated with τ, λte μt + λ t μτe μτ dτ λ μ λ μ e μt. 38 Of coure we don t know whether a cutomer i alive at T ; therefore E[Y t λ, μ, x, t x,t]e[y t λ, μ, alive at T ]P τ >T λ, μ, x, t x,t 39 3
A the tranaction rate λ and death rate μ are unoberved, we compute E[Y t x, t x,t] for a randomly-choen individual by taking the expectation of 39 over the joint poterior ditribution of λ and μ, 33: { E[Y t r, α,, β, x, t x,t] E[Y t λ, μ, alive at T ]P τ >T λ, μ, x, t x,t } gλ, μ r, α,, β, x, t x,t dλdμ 4 Subtituting 3, 33, and 38 in 4, and olving the aociated double integral give u Γr + x + α r β E[Y t r, α,, β, x, t x,t] Γr α + t [ r+x+ ] β + T / β + T + t Lr, α,, β x, t x,t. 4 Rearranging term give u { E[Y t r, α,, β, x, t x,t] Γr + xα r β / Γrα + T r+x β + T r + xβ + T α + T [ } Lr, α,, β x, t x,t ] β + T. β + T + t The bracketed term i our expreion for P alive x, t x,t, 34, while the ret of the expreion i mean of the Pareto/NBD, 27, with updated parameter that reflect the individual behavior up to time T auming no death in,t]; thi i the expreion reported in SMC, equation 22. Reference Abramowit, Milton and Irene A. Stegun ed. 972, Handbook of Mathematical Function, New York: Dover Publication. Andrew, George E., Richard Akey, and Ranjan Roy 999, Special Function, Cambridge: Cambridge Univerity Pre. Caella, George, and Roger L. Berger 22, Statitical Inference, 2nd edition, Pacific Grove, CA: Duxbury. Gradhteyn, I. S. and I. M. Ryhik 994, Table of Integral, Serie, and Product, 5th edition, San Diego, CA: Academic Pre. Fader, Peter S., Bruce G. S. Hardie, and Ka Lok Lee 25a, A Note on Implementing the Pareto/NBD Model in MATLAB. <http://brucehardie.com/note/8/> Fader, Peter S., Bruce G. S. Hardie, and Ka Lok Lee 25b, RFM and CLV: Uing Io-Value Curve for Cutomer Bae Analyi, Journal of Marketing Reearch, 42 November, 45 43. Mood, Alexander M., Franklin A. Graybill, and Duane C. Boe 974, Introduction to the Theory of Statitic, 3rd edition, New York: McGraw-Hill Publihing Company. Schmittlein, David C., Donald G. Morrion, and Richard Colombo 987, Counting Your Cutomer: Who Are They and What Will They Do Next? Management Science, 33 January, 24. 4