Derivation for Input of Factor Graph Representation

Dervaton for Input of actor Graph Representaton Sum-Product Prmal Based on the orgnal LP formulaton b x θ x + b θ,x, s.t., b, b,, N, x \ b x = b we defne V as the node set allocated to the th core. { V }\{{} =, 2, 3..., n} s defned as the ndex of a free varable set on the -th core. Note the set of does not nclude sngle varable. actors can share the same free varable set on the -th core, by requrng the factors wth the same free varable set to have same margnal dstrbuton on ther common free varable set, we can compress these factors and dramatcally reduce dual varables to ncrease effcency. In other words, we replace the orgnal local margnal polytope wth a herarchcal consstency polytope, where margnal consstency s jontly replaced by factor to free varable set and free varable set to node consstency. b s.t. b θ V + b x ˆθ x + b x ˆθ x N x V N x +ɛ c H b + ɛ c Hb V V N +ɛ c H b + ɛ c Hb N b x = b x,, N, x x \ b x = b, V,, N x \ = b,, N, x \ b 2 b x = b x, : V, x More formally, herarchcal consstency polytope s the feasble set of the above LP wth entropy barrer functons. f the ndex of free varable sets on the -th core. N means f and only f V =. N f and only f s a neghbor of on the -th core. And N f and only f V =. The new formulaton s a approxmaton to the LP relaxaton of the orgnal MAP problem. Note we dstngush factors ntersectng the free

varables wth sngle varable and those wth mult-varate set, so that we can have a clearer factor graph message passng representaton after compressng factors. As x \ b x = x \ b x = b x = = b, every x \ x \ x \ b confguraton of b, b n the herarchcal consstency polytope s n the local margnal polytope.or arbtrary vald dstrbuton denoted by b, settng b x = bx, b = b and b = b, we can see the vald dstrbuton corresponds to a pont n the herarchcal consstency polytope.in other word, t s guaranteed that the herarchcal consstency polytope the new feasble set s a subset of the local margnal polytope. On the other hand, every vald probablty dstrbuton n the margnal polytope s guaranteed to be n the herarchcal consstency polytope. And we have a smaller searchng space than local margnal polytope but all vald dstrbuton n margnal polytope wll be ncluded n the new searchng space. 2 Sum-Product Dual By assocatng dual varables δ, δ, λ and ν x to the three type of constrants n Equ. 0, we have the followng clam. Clam. Let N f and only f V =. The dual of the program n Equ. 0 s θ λ x ln N δ N V + ˆθ, x \ + νx, x \ + δx mn λ,δ,µ V + + N ln ln x x \ δx N + ˆθ ln N x \ x N λ, x \ + ν, x \ + δ s.t : V ν x = 0 3 When consstency varables ν are fxed, the sub-program for the -th core and the correspondng λ, δ update rule are shown n Clam 2 2

Clam 2. When ν s fxng, the bloc coordnate descent can be acheve by solvng the program ˆθ λ x ɛĉ ln N ɛĉ V mn λ ˆθ + ɛĉ ln + λ x N ɛĉ x where γx = ln ˆθ x + ν x 4 x \ γx = ln ˆθ x + ν x x \ ˆθ = γx ˆθ = θ + γx N N and ĉ = c + c ĉ = c + c N N 3 Sum-Product Update Rule rom the program n Equ 8 and 20, the update rule s summarzed n Clam 8..3. Note that we use λ to represent λ n the followng for smplcty. Clam 3. The message passng rule for λ s exactly the same as convex BP rule:, N, λ = ĉ ˆθ + c µ β µ β N ˆθ where µ = ɛĉ ln + λ j j N \ x j ɛĉ x \x and c = ĉ + N ĉ The bloc coordnates rule for varables ν s ν x = N P j: V j 5 δ j x V j δ x V 6 where N P s the number of sub-program n whch s nvolved. or arbtrary confguraton of λ, varables δ can be decoded as δx = c ˆθ ĉ + λ j x j γx 7 j N and δ = c ĉ ˆθ N λ γx 8 3

whch s dentcal to δ = c c ˆθ + N µ γx 9 by substtutng λ. 4 Max-Product Prmal The LP prmal formulaton, whch corresponds to Max-Product message passng, s b s.t. b θ + b x ˆθ x V N x + b x ˆθ x V N x b x = b x,, N, x x \ b x = b, V,, N x \ = b,, N, x \ b b x = b x =, b = x b, : V, x b x =, : V x b, b, b, b 0 0 5 Max-Product Dual By assocatng the constrants wth δ, δ, λ, ν x, η, η, η and η, the dual problem s shown n the followng Clam. Clam 4. The dual of the Max-Product problem s 4

mn s.t. + : V V V + + θ N ˆθ x + ν x x + δx N x N δ + λ x δx N N λ N x ˆθ x + ν x + δ ν x = 0, x Clam 5. When ν s fxng, the objectve functon of Equ 42 s lower bounded by mn ˆθ λ λ x + ˆθ V x N + where γx = ˆθ x + νx γx = ˆθ x + νx x \ x \ ˆθ = γx ˆθ = θ + γx N N N λ 2 6 Max-Product Update Rule The Max-Product Update Rule s summarzed n the followng. represent λ n the followng for smplcty. Note that we use λ to Clam 6. The message passng rule for λ s exactly the same as convex BP rule:, N, λ = + N ˆθ + µ β µ β N where µ = ˆθ x \x + λ j x j j N \ The bloc coordnates rule for varables ν s νx = N P j: V j 3 δ j x V j δ x V 4 where N P s the number of sub-program n whch s nvolved. or arbtrary confguraton of λ, varables δ can be decoded as δx = + N ˆθ + λ j x j γx 5 j N and δ = + N ˆθ N λ γx 6 5

whch s dentcal to by substtutng λ. δ = + N ˆθ + N µ γx 7 7 Algorthm 7. Sum-Product The nference algorthm can be ressed n the followng sum-product fashon. Algorthm Inference : Input: ψ = θ, ˆψ x = θx N P 2: whle Untl convergency do 3: for all do 4:, N, x, σ = ˆψ x n s x x \ 5: V, N,, σ = x \ ˆψ x n s x 6:, x, ˆψ = σx N 7: V,, ˆψ = ψ σx N 8: end for 9: for all do 0: η = Sub-Inference ˆψ, ˆψ, σ : end for 2: for all do / N P 3: n s x = n j s x V j n j: V j 4: end for 5: end whle s x V 6

Algorthm 2 Sub-Inference : Input: σx, σx, ˆψ, ˆψ, n = λ, m = µ 2: for all t num of nner ter do 3: for all V do 4: N,, n = ˆψ m β 5: N,, m = x \ 6: end for 7: for all : V do 8: f V s a sngle node then 9: ηx = ˆψ m N β N ĉ c / m ˆψ n j j N \ x j c c / σ 0: else c / ĉ : ηx = ˆψ n j j N x j σx 2: end f 3: end for 4: end for 5: Return η ɛĉ ɛĉ 7.2 Max-Product The nference algorthm can be ressed n the followng -product fashon. Algorthm 3 Inference : Input: θ, ˆθ x = θx N P 2: whle Untl convergency do 3: for all do 4:, N, x, γ = ˆθ x + νx x \ 5: V, N,, γx = ˆθ x + νx x \ 6:, x, ˆθ = γx N 7: V,, ˆθ = θ + γx N 8: end for 9: for all do 0: δ = Sub-Inferenceˆθ, ˆθ, γ : end for 2: for all do 3: νx = N P δx j V j δx V 4: end for 5: end whle j: V j 7

Algorthm 4 Sub-Inference : Input: γ, γ, ˆθ, ˆθ 2: for all t num of nner ter do 3: for all V do 4: N,, λ = + N 5: N,, µ = x \ 6: end for 7: for all : V do 8: f V s a sngle node then 9: δ = + N 0: else : δ = + N 2: end f 3: end for 4: end for 5: Return δ ˆθ + N ˆθ + β N µ β ˆθ + λ j j N \ x j µ γ ˆθ + λ j j N x j γx µ 8 Appendx 8. Proof of the Clams 8.. Proof of Clam Clam. Let N f and only f V =. The dual of the program n Equ. 0 s θ λ x ln N δ N V + ˆθ, x \ + νx, x \ + δx mn λ,δ,µ V + + N ln ln x x \ δx N + ˆθ ln N x \ x N λ, x \ + ν, x \ + δ s.t : V ν x = 0 8 8

Proof. The lagrangan dual of the prmal problem s b θ + b x ˆθ x V N x + b x ˆθ x V N x +ɛ c H b + ɛ c Hb + ɛ c Hb + ɛ V N V N + δ b x b x L = N x x \ + δx b x b V N x \ + λ x b V x b x \ + νx b x b x N x + ν x b x b x V N x b θ λ x δx + ɛ c H b V N N V + b x ˆθ, x \ + νx, x \ + δx V N x +ɛ c Hb V N = + b x δ x + λ x x N N +ɛ + c H b N x b x +ɛ c Hb N b x νx x : V ˆθ x, x \ + ν, x \ + δx c H b 9 We mze L over b analytcally wth the fact that log-sum- functon s the conjugate functon of entropy under the smplex constrants. In addton, to prevent b x νx sup b x : V from gong to negatve nfnty, the last term n the lagrangan dual gves addtonal constrants of νx = 0, whch results n the dual program n Clam : V 9

8..2 Proof of Clam 2 Clam. When ν s fxng, the bloc coordnate descent can be acheve by solvng the program ˆθ λ x ɛĉ ln N ɛĉ V mn λ ˆθ + ɛĉ ln + λ x N ɛĉ x where γx = ln ˆθ x + ν x x \ γx = ln ˆθ x + ν x x \ ˆθ = γx ˆθ = θ + γx N N and ĉ = c + c ĉ = c + N N c 20 Proof. Dervaton for free var set terms: By extractng all the terms nvolvng x from Equ 8, we have L = ln δx N + λ x N x + ˆθ x ln, x \ + ν, x \ + δx N x x \ = ln δx N + λ x N x + γ x ln + δ N x Settng the dervatve wth respect to δ to 0, we have γ +δ x x γ +δ x x = δ N x + λ N δ N x + λ N 2 22 Introducng a degree of freedom n normalzaton, we have γx + δx = c δ c x + N N λ 23 0

Summng over N, t gves N δ = N c ĉ N λ c ĉ N γ 24 Substtutng t bac to Equ 9, we have γ + δ x δ + λ x N = N = c c wth whch we compress L n Equ 2 nto a sngle term ˆθ + L = ɛĉ ln ɛĉ N λ N λ + γx N ĉ 25 26 Dervaton for node terms: The dervaton goes smlarly to the above process for terms whch are compressed nto node. We also put t here for clearer future reference. L = ln θ λ x N δ N + ˆθ, x \ + ν ln x, x \ + δx N x x \ = ln θ λ x N δ 27 N + γ ln + δ N x Settng the dervatve wth respect to δx to 0, we have γ x+δ x = γ +δ x x θ δ N x λ N θ δ N x λ N 28 Introducng a degree of freedom n normalzaton, we have γx + δx = c θ δ c x N Summng over N, t gves N δ = N c ĉ θ N N λ x c λ x 29 ĉ N γ 30

Substtutng t bac to Equ 29, we have θ δ γx + δx x λ x N = N θ + γ N = c c ĉ 3 wth whch we compress L n Equ 2 nto a sngle term ˆθ L = ɛĉ ln ɛĉ N λ N λ 32 Combnng Equ 26 and Equ 32 wll gve us the resson n Equ 20. 8..3 Proof of Clam 3 Clam. The message passng rule for λ s exactly the same as convex BP rule:, N, λ = ĉ ˆθ + c µ β µ β N ˆθ where µ = ɛĉ ln + λ j j N \ x j ɛĉ x \x and c = ĉ + N ĉ The bloc coordnates rule for varables ν s ν x = N P j: V j 33 δ j x V j δ x V 34 where N P s the number of sub-program n whch s nvolved. or arbtrary confguraton of λ, varables δ can be decoded as δx = c ˆθ ĉ + λ j x j γx 35 j N and whch s dentcal to δ = c ĉ δ = c c ˆθ ˆθ + N N λ γx 36 µ γx 37 by substtutng λ. Proof. or λ varables, the rule s exactly the same as convex BP. 2

In terms of ν, from the program n Clam, we tae the lagrangan dual of the objectve functon ˆL = ln ˆθ x + ν x + δx V + η x ν x : V x x : V 38 settng the dervatve wth respect to νx as zero, whch gves the followng equaton ˆθx +ν x+δ x V = η x 39 ˆθx +ν x+δ x V x As any set of νx satsfyng the above equaton s guaranteed to be optmal and every optmal s guaranteed to satsfy νx = 0, we can add arbtrary addtonal constant to νx : V as long as Equ 39 s satsfed. The effect of addtonal constant s accumulated to 0 over. Thus the resultng optmal value for ˆL s unchanged..e. we can smply tae ˆθ x + ν x + δ x V = β x 40 summng over : V, νx are elmnated whch gves θ x + δx V = N P β x 4 : V We can substtutng t bac to Equ 40 and get the rule n Clam 8..3. Note the lct resson of δ n Equ 35 and Equ 36 can be drectly derved from Equ 25 and 3. 8..4 Proof of Clam 4 Clam. The dual of the Max-Product problem s mn s.t. + : V V V + + θ N ˆθ x + ν x x + δx N x N δ + λ x δx N N λ N x ˆθ x + ν x + δ ν x = 0, x 42 Proof. Wth standard LP prmal-dual tranformaton c T x = mn b T y s.t. Ax = b s.t. A T y c x 0 43 3

we have the followng dual formulaton { η mn + η + η + } η V V N N s.t. η + δx + λ x ˆθ, N N η + δx λ x 0,, x N N η δ ν x ˆθ x η δx νx ˆθ x νx 0 : V, V, N, x,, N, x, x 44 Note we reverse the sgn of δ, λ and ν. By substtutng η bac nto the dual objectve functon, we have the resson n Equ 42. Note that when νx > 0, we can always : V construct ν satsfyng x = 0 wth whch the objectve value s not ncreased. Thus : V ν we can replace the last nequalty wth equalty. 8..5 Proof of Clam 5 Clam. When ν s fxng, the objectve functon of Equ 42 s lower bounded by mn ˆθ λ λ x + ˆθ V x N + where γx = ˆθ x + νx γx = ˆθ x + νx x \ x \ ˆθ = γx ˆθ = θ + γx N N N λ 45 Proof. Dervaton for node terms L = θ λ x δ x N N = = θ θ ˆθ N N N λ x δx N + + λ x + γx N λ x ˆθ x + ν x + δx N x \ N γ + δ 46 4

Dervaton for free var set terms L = δx x + N = δx x + N γx x + N = ˆθ x + N λ N λ N λ N λ + + x N N x ˆθ x + νx x \ γx + δx 47 + δ 8..6 Proof of Clam 6 Clam. The message passng rule for λ s exactly the same as convex BP rule:, N, λ = + N ˆθ + µ β µ β N where µ = ˆθ x \x + λ j x j j N \ The bloc coordnates rule for varables ν s νx = N P j: V j 48 δ j x V j δ x V 49 where N P s the number of sub-program n whch s nvolved. or arbtrary confguraton of λ, varables δ can be decoded as δx = + N ˆθ + λ j x j γx 50 j N and δ = + N ˆθ N λ γx 5 Proof. rom Equ 45, the sum of terms nvolvng node s ˆθ λ x + ˆθ x N N + = ˆθ ˆθ + N β N λ x + µ β N x \ N λ ˆθ + j N \ λ j x j + λ x 52 5

λ. It s a lower bound achevable wth Equ 48, whch gves the bloc coordnate descent rule over rom Equ 42, the sum of terms nvolvng ν x s : V ˆθ x + ν x x + δx The lower bound can be achved by settng ˆθ x + νx + δx = N P θ x + rule n Equ 49. x : V θ x + δ : V δx 53, whch s equvalent to the By evaluate δ wth Equ 50 and 5, the lower bound n Equ 45 can be achved, whch gves the coordnate ascent rule for δ 8.2 Another Expresson of Sum-Product The algorthm can be dentcally ressed as the followng. In the mplementaton, we use the computaton procedure shown n ths resson. Algorthm 5 Inference : Input: θ, ˆθ x = θx N P 2: whle Untl convergency do 3: for all do 4:, N, x, γ = ln ˆθx +ν x x \ 5: V, N,, γ = ln x \ 6:, x, ˆθ = γx N 7: V,, ˆθ = θ + γx N 8: end for 9: for all do 0: δ = Sub-Inferenceˆθ, ˆθ, γ : end for 2: for all do 3: νx = N P δx j V j δx V 4: end for 5: end whle j: V j ˆθx +ν x 6

Algorthm 6 Sub-Inference : Input: γx, γx, ˆθ, ˆθ 2: for all t num of nner ter do 3: for all V do 4: N,, λ = ĉ c ˆθ + 5: N,, µ = ɛĉ ln x \ 6: end for 7: for all : V do 8: f V s a sngle node then 9: δ = c c 0: else : δ = c ĉ 2: end f 3: end for 4: end for 5: Return δ ˆθ + N µ µ β µ β N ˆθ + λ j x j j N \ γ ˆθ + λ j j N x j γx ɛĉ 7