Deriving an Expression for P (X(t) = x) Under the Pareto/NBD Model

Σχετικά έγγραφα
Deriving the Conditional PMF of the Pareto/NBD Model

Additional Results for the Pareto/NBD Model

Section 8.3 Trigonometric Equations

Solutions to Exercise Sheet 5

Example Sheet 3 Solutions

2 Composition. Invertible Mappings

ST5224: Advanced Statistical Theory II

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Math221: HW# 1 solutions

Problem Set 3: Solutions

Solution Series 9. i=1 x i and i=1 x i.

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Matrices and Determinants

Homework 8 Model Solution Section

5.4 The Poisson Distribution.

The Simply Typed Lambda Calculus

derivation of the Laplacian from rectangular to spherical coordinates

Areas and Lengths in Polar Coordinates

Homework 3 Solutions

4.6 Autoregressive Moving Average Model ARMA(1,1)

Statistical Inference I Locally most powerful tests

Uniform Convergence of Fourier Series Michael Taylor

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

EE512: Error Control Coding

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

C.S. 430 Assignment 6, Sample Solutions

D Alembert s Solution to the Wave Equation

Areas and Lengths in Polar Coordinates

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Srednicki Chapter 55

PARTIAL NOTES for 6.1 Trigonometric Identities

Exercises to Statistics of Material Fatigue No. 5

6.3 Forecasting ARMA processes

Concrete Mathematics Exercises from 30 September 2016

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

An Inventory of Continuous Distributions

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Second Order Partial Differential Equations

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

A Note on Deriving the Pareto/NBD Model and Related Expressions

Partial Differential Equations in Biology The boundary element method. March 26, 2013

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Higher Derivative Gravity Theories

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Lecture 2. Soundness and completeness of propositional logic

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Finite Field Problems: Solutions

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

Mock Exam 7. 1 Hong Kong Educational Publishing Company. Section A 1. Reference: HKDSE Math M Q2 (a) (1 + kx) n 1M + 1A = (1) =

Every set of first-order formulas is equivalent to an independent set

Second Order RLC Filters

( y) Partial Differential Equations

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Congruence Classes of Invertible Matrices of Order 3 over F 2

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Other Test Constructions: Likelihood Ratio & Bayes Tests

CE 530 Molecular Simulation

Reminders: linear functions

1 String with massive end-points

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Evaluation of some non-elementary integrals of sine, cosine and exponential integrals type

Approximation of distance between locations on earth given by latitude and longitude

Section 7.6 Double and Half Angle Formulas

Probability and Random Processes (Part II)

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Strain gauge and rosettes

Tridiagonal matrices. Gérard MEURANT. October, 2008

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

CRASH COURSE IN PRECALCULUS

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

DiracDelta. Notations. Primary definition. Specific values. General characteristics. Traditional name. Traditional notation

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Partial Trace and Partial Transpose

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

w o = R 1 p. (1) R = p =. = 1

Solutions to the Schrodinger equation atomic orbitals. Ψ 1 s Ψ 2 s Ψ 2 px Ψ 2 py Ψ 2 pz

Fractional Colorings and Zykov Products of graphs

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Depth versus Rigidity in the Design of International Trade Agreements. Leslie Johns

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

Business English. Ενότητα # 9: Financial Planning. Ευαγγελία Κουτσογιάννη Τμήμα Διοίκησης Επιχειρήσεων

Section 9.2 Polar Equations and Graphs

Space-Time Symmetries

ΗΜΥ 220: ΣΗΜΑΤΑ ΚΑΙ ΣΥΣΤΗΜΑΤΑ Ι Ακαδημαϊκό έτος Εαρινό Εξάμηνο Κατ οίκον εργασία αρ. 2

Μηχανική Μάθηση Hypothesis Testing

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

A summation formula ramified with hypergeometric function and involving recurrence relation

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

Transcript:

Deriving an Expression for P Xt x Under the Pareto/NBD Model Peter S. Fader www.petefader.com Bruce G. S. Hardie www.brucehardie.com September 26 Introduction The Pareto/NBD Schmittlein et al. 987, hereafter SMC is a model for customer-base analysis in a noncontractual setting. One result presented in SMC is an expression for P Xt x, where the random variable Xt denotes the number of transactions observed in the time interval,t]. This note derives an alternative expression for this quantity, one that is simpler to evaluate. In Section 2 we review the assumptions underlying the Pareto/NBD model. In Section 3, we derive an expression for P Xt x conditional on the unobserved latent characteristics λ and μ; we remove this conditioning in Section 4. For the sake of completeness, SMC s derivation is replicated in the Appendix. 2 Model Assumptions The Pareto/NBD model is based on the following assumptions: i. Customers go through two stages in their lifetime with a specific firm: they are alive for some period of time, then become permanently inactive. ii. While alive, the number of transactions made by a customer follows a Poisson process with transaction rate λ. Denoting the number of transactions in the time interval,t]bythe random variable Xt, it follows that P Xt x λ, alive at t λtx e λt, x,, 2,... iii. A customer s unobserved lifetime of length ω after which he is viewed as being inactive is exponentially distributed with dropout rate μ: fω μ μe μω. iv. Heterogeneity in transaction rates across customers follows a gamma distribution with shape parameter r and scale parameter : gλ r, r λ r e λ. Γr c 26 Peter S. Fader and Bruce G. S. Hardie. Document source: <http://brucehardie.com/notes/2/>.

v. Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter s and scale parameter. gμ s, s μ s e μ Γs. 2 vi. The transaction rate λ and the dropout rate μ vary independently across customers. 3 P Xt x Conditional on λ and μ Suppose we know an individual s unobserved latent characteristics λ and μ. Assuming the customer was alive at time, there are two ways x purchases could have occurred in the interval,t]: i. The individual remained alive through the whole interval; this occurs with probability e μt. The probability of the individual making x purchases, given she was alive during the whole interval, is λt x e λt /. Therefore, the probability of remaining alive through the interval,t] and making x purchases is λt x e λ+μt. ii. The individual died at some point ω < t and made x purchases in the interval,ω]. The probability of this occurring is λω x e λω μe μω dω λ x ω x e λ+μω μ dω λ x μ t λ + μ x+ ω x e λ+μω λ + μ x+ dω which, noting that the integrand is an Erlang-x + pdf, λ x [ μ λ + μ x+ e λ+μt i [ λ + μt ] i Combining these two scenarios gives us the following expression for the probability of observing x purchases in the interval,t], conditional on λ and μ: P Xt x λ, μ λtx e λ+μt λ x [ μ + λ + μ x+ e λ+μt i 4 Removing the Conditioning on λ and μ i! [ λ + μt ] i i! ]. ]. 3 In reality, we never know an individual s latent characteristics; we therefore remove the conditioning on λ and μ by taking the expectation of 3 over the distributions of Λ and M: P Xt x r,, s, Substituting 3 in 4 give us P Xt x r,, s, A + A 2 P Xt x λ, μgλ r, gμ s, dλ dμ. 4 i t i i! A 3 5 2

where A A 2 A 3 i. Solving 6 is trivial: A λt x e λ+μt gλ r, gμ s, dλ dμ 6 λ x μ gλ r, gμ s, dλ dμ λ + μ x+ 7 λ x μe λ+μt gλ r, gμ s, dλ dμ 8 λ + μ x i+ Γr + x r t x s 9 Γr + t + t + t ii. To solve 7, consider the transformation Y M/Λ + M and Z Λ+M. Using the transformation technique Casella and Berger 22, Section 4.3, pp. 56 62; Mood et al. 974, Section 6.2, p. 24ff, it follows that the oint distribution of Y and Z is gy, z,, r, s r s ΓrΓs ys y r z r+s e z y. Noting that the inverse of this transformation is λ yz and μ yz, it follows that A 2 r s ΓrΓs r s ΓrΓs r s Γr + s ΓrΓs r s Br, s y y x gy, z,, r, s dy r+s y s y r+x z r+s e z y dy y s y r+x { z r+s e z y dy y s y r+x y r+s dy y s y r+x [ y ] r+s dy which, recalling Euler s integral for the Gaussian hypergeometric function, r s Br + x, s + r+s 2F r + s, s +;r + s + x +; Br, s. Looking closely at, we see that the argument of the Gaussian hypergeometric function,, is guaranteed to be bounded between and when since >, thus ensuring convergence of the series representation of the function. However, when <we can be faced with the situation where <, in which case the series is divergent. Applying the linear transformation Abramowitz and Stegun 972, equation 5.3.4 2F a, b; c; z z a 2F a, c b; c; z z, 2 gives us A 2 r s Br + x, s + r+s 2F r + s, r + x; r + s + x +; Br, s. 3 We see that the argument of the above Gaussian hypergeometric function is bounded between and when. We therefore present and 3 as solutions to 7: we use when and 3 when. 2 F a, b; c; z Bb,c b tb t c b zt a dt, c>b. 3

iii. To solve 8, we also make use of the transformation Y M/Λ + M and Z Λ+M. Given, it follows that A 3 r s ΓrΓs r s ΓrΓs r s Γr + s + i ΓrΓs r s Br, s y s y r+x z r+s+i e z+t y dy y s y r+x { Γr + s + i Γr + s z r+s+i e z+t y dy y s y r+x + t y r+s+i dy + t r+s+i y s y r+x [ +t y ] r+s+i dy which, recalling Euler s integral for the Gaussian hypergeometric function, r s Br + x, s + + t r+s+i Br, s Γr + s + i Γr + s 2 F r + s + i, s +;r + s + x +; +t. 4 Noting that the argument of the Gaussian hypergeometric function is only guaranteed to be bounded between and when, we apply the linear transformation 2, which gives us r s Br + x, s + A 3 + t r+s+i Br, s Γr + s + i Γr + s. 5 2 F r + s + i, r + x; r + s + x +; +t The argument of the above Gaussian hypergeometric function is bounded between and when. We therefore present 4 and 5 as solutions to 8, using 4 when and 5 when. Substituting 9,, 3, 4, and 5 in 5, and noting that Γr + s + i Γr + s i! ibr + s, i, yields the following expression for the distribution of the number of transactions in the interval,t] for a randomly-chosen individual under the Pareto/NBD model: where and P Xt x r,, s, B B 2 Γr + x r t Γr + t + r s Br + x, s + Br, s x + t + t { B 2F r + s, s +;r + s + x +; r+s 2F r + s, r + x; r + s + x +; r+s i s t i ibr + s, i B 2 if if 2F r + s + i, s +;r + s + x +; +t + t r+s+i if 2F r + s + i, r + x; r + s + x +; +t + t r+s+i if. 6 4

We note that this expression requires x+2 evaluations of the Gaussian hypergeometric function; in contrast, SMC s expression see the attached appendix requires 2x + evaluations of the Gaussian hypergeometric function. The equivalence of 6 and A, A3, A4 is not immediately obvious. Purely from a logical perspective, they must be equivalent. Furthermore, equivalence is observed in numerical investigations. However, we have yet to demonstrate direct equivalence of these two expressions for P Xt x r,, s,. Stay tuned. References Abramowitz, Milton and Irene A. Stegun eds. 972, Handbook of Mathematical Functions, New York: Dover Publications. Casella, George, and Roger L. Berger 22, Statistical Inference, 2nd edition, Pacific Grove, CA: Duxbury. Mood, Alexander M., Franklin A. Graybill, and Duane C. Boes 974, Introduction to the Theory of Statistics, 3rd edition, New York: McGraw-Hill Publishing Company. Schmittlein, David C., Donald G. Morrison, and Richard Colombo 987, Counting Your Customers: Who Are They and What Will They Do Next? Management Science, 33 January, 24. 5

Appendix: SMC s Derivation of P Xt x r,, s, SMC derive their expression for P Xt x by first integrating over λ and μ and then removing the conditioning on ω, which is the reverse of the approach used in Sections 3 and 4 above. This gives us Γr + x r t x s P Xt x r,, s, where C Γr + t + t {{ + NBD P Xtx + t {{ P Ω>t x Γr + x r ω Γr + ω + ω {{ NBD P Xωx Γr + x Γr + t Making the change of variable y + ω, C +t y x y r+x y + s+ dy r t + t x + t ω x + ω r+x + ω s+ dω. which, recalling the binomial theorem, 2 +t x y x y r+x y + s+ dy s s+ + ω {{ fω dω s Γr + x + s r s C Γr x +t y r+ y + s+ dy { x y r+ y + s+ dy y r+ y + s+ dy +t letting z /y in the first integral which implies dy z 2 and z + t/y in the second integral which implies dy + tz 2, x { r+s+ + t r+s+ z r+s+ z s+ z r+s+ +t z s+ which, recalling Euler s integral for the Gaussian hypergeometric function, { x 2F s +,r+ s + ; r + s + +; r + s + r+s+ 2 F s +,r+ s + ; r + s + +; +t + t r+s+. A3 A A2 2 x + y r r k x k y r k for integer r. 6

We note that the arguments of the above Gaussian hypergeometric functions are only guaranteed to be bounded between and when. We therefore revisit A2, applying the change of variable y + ω: C +t y x y s+ y + r+x dy which, recalling the binomial theorem, +t x y x y s+ y + r+x dy x +t y s++ x y + r+x dy { x y s++ x y + r+x dy y s++ x y + r+x dy +t letting z /y in the first integral which implies dy z 2 and z + t/y in the second integral which implies dy + tz 2, x { r+s+ + t r+s+ z r+s+ z r+x z r+s+ +t z r+x which, recalling Euler s integral for the Gaussian hypergeometric function, { x 2 F r + x, r + s + ; r + s + +; r + s + r+s+ 2 F r + x, r + s + ; r + s + +; +t + t r+s+. A4 We note that A3 and A4 each require 2x + evaluations of the Gaussian hypergeometric function. 7