212 11th International Conference on Machine Learning and Applications C b G E P fi d P P I f Id fy F M d D d W, M O h, E Z,T L C f C S, U v y f M, C G b, FL 33146, USA E : d.w 1@. d, h @.. d D f C S d E, U v y f N D, N D, I 46556, USA E : z @ d. d S h f C S, F d I U v y, M, FL 33199, USA E : @.fi. d Abstract Identifying functional modules from protein-protein interaction networks is an important and challenging task. This paper presents a new approach called PPIBM which is designed to integrate gene expression data analysis and clustering of protein-protein interactions. The proposed approach relies on a Bayesian model which uses as its base protein-protein interactions given as part of input. The proposed method is evaluated with standard measures and its performance is compared with the state-of-the-art network analysis methods. Experimental results on both real-world data and synthetic data demonstrate the effectiveness of the proposed approach. I. INTRODUCTION I b y f h h y d y h zy, y, d f f. Wh h, h f d (PPI). I PPI d. F, h f h y h h, wh h y h y h ; d h f y q f w h h v. S h b y v w d w k f (PPI w k) wh h h w k d d v d w h d fi d ( y, b y,.) h d h. I b f w k, d h, h PPI w k f h d f d d f y d b w b f f h. Th f PPI w k f, wh h f h f h k h, w b d f y b w k, h f wh h f f. A j b b y d fy h f b w k d v f. A h f PPI w k h wh h d d v d d d h. U h h b w k b b f d h b f h w k by v y d. V h h d h v b d yz h f h PPI w k. F, d h q h v b f d h h f d d b PPI w k. S h h q d k, d b w d hb h d h (,.., [11], [1], [16]). A, h h h d h v b d d yz d [5], [22], [1]. M y, b, wh h b d ff h, h b d v h v y q y [32]. A h h h h d k fi d d h f z f h, h f y f f h f d fy h f PPI b w k, d h, h h f v. T fi h h w d h b f d y b w k d fi, w h h d h h fi f w h w h h h k y h v d fi d h v b v v j y w h h h. I h, w B y f w k f k h d dy h w h v h f d d fi. II. RELATED WORK A. Clustering Methods for PPI Networks A d fy f d f PPI w k b y, h h h b d d f y. F, h b h w h wh PPI w k v w d h, f d f d q d h h f q fi d h h b d f f d d fi [24]. Th f h h b d f h [21], [16], [9]. A v dy 978--7695-4913-2/12 $26. 212 IEEE DOI 119/ICMLA.212.28 114
f v h d f w k, d h d h b v, h b d [4]. A h h fi h v b d h, h b f d fy f d h f h f w. (1) Th d q y f y. D ff h h h h h d h y w hyb d y d y h d d v d ff PPI d h v h [28]. (2) PPI w k y f w d f v y h h d ( d h b ), d h d h v v y f w w h h h [2]. Th h k d ffi fi d f d d h d h. (3) S d d b f f f v. Th h h d f,..,. T v h b, w d b d h, d by w k h f d, d h d b fi d PPI w k. B. Combining Content and Relation Information Th b f b f d f (.., k f ) h b w d y d d h d d f v [31]. E y h k h b f w : (1) F I [7]. Th h h f d d d fi d f. A w h h h h, fi d fi h v d ff, d f y d d h q y f f. (2) K I [15]. I h h, h d k h f d d y d h f y,.., h K v. A d wb k f h h h d f y h b w h w f d h by h h y h h b w h. T dd h h y h h v, j b dd h (PHITS d f z.) h v b d y [13]. Th h by fi k w d b dd v j f z f b h h d f d h f fi h f d. III. METHODOLOGY A. Method Overview F 1 h w v v w f h d h d f d fy f d,..,. G v PPI w k d d f h h w k, w b h wh h d by f h PPI w k d f h fi. Th w h w f f B y d by h PPI h h b d. F. 1. Ov v w f h d h d. Th b h b f z d z b d b v d v EM h d b d h b q. P f h f h h, wh h h b b y h h b h d d. O h h h v d, w by h h w h h h h b h b b y. B. PPI-based Bayesian Models (PPIBM) 1) Model Formulation: W S d h f f d G d h f f. F h g G, w d d, n p(s1 n θ g)= p(s i θ g), i=1 wh θ g d h d f g, ds1 n h w d q f {S i S} n i=1,.. h fi f h. W f h h h d f b d. H w b d d h b d. O b fi f h h w b h PPI w k h y y. Th w h v p(s i θ g)= p(s i T i = t)p(t i = t g i = q)p(g i = q θ g), t T g G wh T h f f. H w U pt f h b b y f h p v t,.., p(g i = p T i = t) =U pt, dw {θ g } f h b b y f h f t v g. H w h h f b d, {B pg }, v,.. p(g i = p Θ g )=B pg, d w b B pg by h PPI w k. 115
2) Parameter Estimation: F d fy f d ( ) w d h b h d b f h (.. Θ pt ) h w b U pt. Th w h B y f w k U pt by z Θ pt. T d, w D h f Θ s Dir(α), wh v α hy. W LDA [3], wh h v B y z v b d f h d k h d. H w w h v b d : f(u, V; Y) = g ( B (α + γ s, ) B (α) vkpg B gvu vk φ vk;gs ) Ygsφ vk;gs, wh h d f V V = {V R G K + }, φ vk;sg = B gv U vk V sk /[B UV ] gs, γ sk = gv Y gsφ vk;gs.th v B y d, wh h h LDA h [3] y, h v b d f h U. Th h q E M z (EM) d fi d h v b d. Th d EM w d w h z d. T v d h EM, w d y z h v b d b h d. H w h f D h dj h h d v v d v v D h d b [3]. Th d : [ ] Y U B Ṽ Ũ (2) B [ ŨṼ ] Y V (B U) Ṽ (3) B UṼ Th w d h v d v f z (NMF) [19], f h D h dj. Th, by v y y Eq. (2) d Eq. (3), w z h v b d, wh h f(u, V; Y). T v fy h f h d, w h v h f w. Proposition 1: F y Ũ d Ṽ, f w b U d V by h f w q d Eq. (2) d (3), w h v f(ũ, Ṽ; Y) f(u, V; Y). Th fi d f f. T v P 1, w d d fi h f w y f. g(u, φ, φ; Y) = D ln B(α)+ ( ) Uvk Y gsφ vk;gs ln vkgs φ vk;gs + [ Ψ(α k + γ sk ) Ψ( ] (α k + γ sk )) (γ sk γ sk ) sk k + g ln B(α + γ g ). Th d h f v b w. (1) (4) Lemma 2: If ( ) U vk = φ vk;gs Y gs / φ vk;g sy g s. (5) v gs h f y Ũ, gs h(u, φ; Y) h(ũ, φ; Y). Proof: T fi d h z U f h(u, φ; Y) w h h, v U vk =1, w d fi h L L w h v L(U) =lnh(u, φ; Y)+ k λ k ( v U vk 1) L = Y gs φ vk;gs /U vk λ k =, U vk gs U vk = 1 Y gs φ vk;gs. λ k gs N z U vk w h v h Eq. (5). Lemma 3: F γ = {γ k : γ k > } d γ = { γ k : γ k > }, w h v ln B(γ) ln B( γ)+ [ Ψ( γ k ) Ψ( ] γ k ) (γ k γ k ) k k Proof: A f Ψ 1 (x) d f x>, w h v 2 ln B(x, y) x 2 =Ψ 1 (x) Ψ 1 (x + y), h ln B(x, y) v w h x. S B(γ) =B(γ 1, K k=2 γ k)b(γ 2,,γ k ), ln B(γ) v w h w h γ 1. B f y f B, ln B(γ) v f w h y γ k. Th, w h v h q y. Lemma 4: If φ b d h f w d γ sk = vg Y gs φvk;gs, V sk =exp(ψ(α k + γ sk ))/ k exp(ψ(α k + γ sk )), (6) φ vk;gs = B gvu vk V sk /[BUV ] gs h h(u, φ; Y) h(u, φ; Y) = f(u, V; Y). Proof: W h v g(u, φ, φ; Y) = lnh(u, φ; Y) By L 3, w h v g(u, φ, φ; Y) ln h(u, φ; Y). T fi d h z φ f g(u, φ, φ; Y) w h h, k φ vk;gs =1, w d fi h L L L(φ) =g(u, φ, φ; Y)+ gs L φ vk;gs = Y gs ln ( BgvU vk φ vk;gs ) Y gs λ gs( k + Y gs [Ψ(α k + γ sk ) Ψ( k (α k + γ sk )) φ k;gs 1) ] + λ gs =, 116
w h v d φ vk;gs = B gvu vk exp{ψ(α k + γ sk ) Ψ( k (α k + γ sk ))} exp{(λ gs Y gs)/y gs} B gvu vk V sk h(u, φ; Y) =exp(g(u, φ, φ; Y)) exp(g(u, φ, φ; Y)) h(u, φ; Y) = f(u, V; Y). N w w v P 1. Proof: W w h f(ũ, Ṽ; Y). U d U by Eq. (5) L 2, w b h(u,ϕ(ũ, Ṽ); Y) h(ũ,ϕ(ũ, Ṽ); Y) = f(ũ, Ṽ; Y). U d V by Eq. (6) L 4, w b Th w h v exp(g(u,ϕ(u, V),ϕ(Ũ, Ṽ); Y)) exp(g(u,ϕ(ũ, Ṽ),ϕ(Ũ, Ṽ), Y)) = h(u,φ(ũ, Ṽ); Y). f(u, V; Y) =h(u, ϕ(u, V); Y) exp(g(u,ϕ(u, V),ϕ(Ũ, Ṽ); Y)). Th, by v y y Eq. (2) d Eq. (3), w z h v b d, wh h h v f f(u; Y). 3) Algorithm Procedure: W h w h d f h d A h 1. Algorithm 1 I v A h Input: Y : B : K : # f f d ( ) Output: U : V : y 1: R d y z U d V, d z h 2: repeat 3: U d U Eq. (2); 4: U d V Eq. (3); 5: C f Eq. (1); 6: until f v. H, Eq. (2) d Eq. (3) d h E d h M, v y, f h v B y h. IV. EXPERIMENTS W h y f h d f d fy f d :, PPI w k, d by b d PPI y. W dy h fl f d ff PPI w k q. A. The Data Set W w d d w d d. 1) The Real Data: Th y y d d h y d d. Th y y d d d y [23]. Th d d by C. [6] dy h w k. Th w k d by h b b d h h h k: h ://www... d / b /h v h/ C N w k/m C /. W h d f w : W fi v y h h v h 5% f v. Th 44 (C Cy ) y w h 2725 y. W h y dj y f, wh h w P ffi v f h. Th y d b d f BIOGRID d b (h :// h b d. /) [25]. 2) Synthetic Data: O d d h PPI d d. Th h d d b d [17] d f d. S f y, w 3 fi, G i, 1 i 3, wh h fi G i v w h 5, d 5 y. Th dj b b f d, ( h b, N, 5 ). E h d d by d d d. L f d d H v h [17] f d h b b d by f S V D (SVD) f h d. Th z f 5 d (.., h b f h d ) 9, 45, 27, 12, d 18, v y. I h, h PPI b d h d d b d [18]. B. Evaluation Methods T h q y, w y d z d f (). A y h h b w h d h d h, wh h h f h h d b w f d. N z d M I f (N MI) [26] h f f h d by w d v b d d y b. C. Clustering using Gene Expression Profiles of the Real Data Set I h f, w d b d h fi y. W h f w h w d y d h d. (1)K : w h d d K. (2)N v M F z (NMF): w v f z [2] b h. (3)Sy N v M F z (SNMF): w y v f z [29], wh h f z h d d v y f y v. 1) Experimental Results I: F 2 h w h y d d. F h, w b v h SNMF f K d NMF. 117
.5.45.4.35.3 5 5.5 K means NMF SNMF.7.6.5.4.3 K means NMF SNMF.7.8.6.7.5.6.4.3.5.4.3 ( ) A y F. 2. R d. (b) PLSI+PHITS CM PPIBM PLSI+PHITS CM PPIBM ( ) A y (b) F. 4. R f b d PPI w k..7.7.6.5.4.3 RBR Direct Metis Spectral ( ) A y F. 3. R PPI w k..6.5.4.3 RBR Direct Metis Spectral (b) I b w d h h h y, d h SNMF h d d b d w h y d [29]. Th h h h y wh h k w f. Th y d f v. D. Clustering using the PPI graph of the Real Data Set 1) Implemented Baselines II: W v h h d h h d M (RBR, D, M ) 1, d h h [27] d h PPI h. 2) Experimental Results II: F 3 h w h y d h. By h f h h d y d d h f h h PPI w k, w b v h h b. Th d h dv d h b d h d k f h h d b f. 2) Experimental Results III: F 4 h w h y d by b h d y. F h, w h v h f w b v. (1) I, h d b h h fi d h f h h d y f. Th d h h w y f f y h h d h d y b h v h f. (2) O d B y d h v h h h y d b h d k b f h h dd h h h b h d h PLSI+PHITS. F. PPI Network Quality Analysis using the Synthetic Data Set I h f, w h w h PPI w k q y ff h h y h d wh h h PPI w k w h d ff v. W h f SNMF h d, S h h f PPI w k d, d PPIBM b h d d h h PPI w k. W h SNMF d S b h y f h h d h b d h S IV C d S IV D, v y. F IV F d F IV F h w h..8.7.6.5 E. Combining Gene Expressions and PPI Networks of the Real Data Set.4.3 1) Implemented Baselines III: W w b b d wh h b d ff f. (1) PLSI+PHITS [12] b b d wh h w h d f P b b L S I d (PLSI) d P b b Hy k I d d T S h (PHITS), d PPI f f. (2) C d (CM) [14] h d d h f f v w k d. H w h PPI w k. 1 Th f w b d w d d f h ://.d.. d / kh /v w / /. SNMF Spectral 1 Spectral 2 Spectral 3 PPIBM 1 PPIBM 2 PPIBM 3 F. 5. A y y h d. H SNMF d h f SNMF h d ; S d PPIBM d h h d f d d h h d b 1, 2, d 3 d h v f, 1 b h h h, d 3 b h w. F h, w h v h f w b v. (1) Wh h PPI w k h, h h b d h v d f y. Th b h f d f h k d f w k y y fl h h f h d h y d h d h. (2) Wh h h PPI w k d d, h h b d h f 118
SNMF Spectral 1 Spectral 2 Spectral 3 PPIBM 1 PPIBM 2 PPIBM 3.7.6.5.4.3 F. 6. y h d. h h d y h d wh h w h b v h h w d d. (3) I, h b h w k q y, h b h h b d h d h d d. (4) O d PPI b d B y d z h fl f h d h w k d h w f y, h w y f h h h d. (5) Wh h q fy f h w k h h d v,.., wh h h w k d d v, h f f h d v y v h v. V. CONCLUSION I h, w PPI b d B y d h w f f h fi d h. Th fl b w h w z d v h f d d fi. Th h f h h d f d fy f d, d h d h f f d d. ACKNOWLEDGEMENTS Th w k y d by NSF DBI 8523. REFERENCES [1] V. A, S. M, d I. M. I v y f d. Bioinformatics, 21, 25. [2] A b L z B b d Z N O v. N w k b y: d d h f z. Nat Rev Genet, 5(2):11 113, 24. [3] D. M. B, A. Y. N, d M. I. J d. L d h. I Advances in Neural Information Processing Systems 14, 61 68, 22. [4] S. B h d v J. H d. Ev f h f w k. BMC Bioinformatics, 7:488, 26. [5] C. B, F. Ch v, D. M, J. W j k, A. G h, d B. J q. F fi f f h d f f f C w k. Genome Biol, 5, 23. [6] M. R. C, B. Zh, Z. F, P. S. M h, S. H v h, d S. F. N. G v y, f, d q v : d f d y w k. BMC Genomics, 7, 26. [7] S Ch k b, By D, d P I dyk. E h d hy z hy k. SIGMOD Rec., 27:37 318, J 1998. [8] D v d C h d Th H f. Th k b b d f d d hy v y. I Advances in neural information processing systems, 2. [9] Ch D, X f H, R h d F. M z, d S h R. H b k. A fi d f d f d w k. Proteins: Structure, Function, and Bioinformatics, 57(1):99 18, 24. [1] R. D, F. D db d, d C.M. S d. Th f d b w v b f w k. BMC Bioinform, 6, 25. [11] G.D.B d d C.W. H. A d h d f fi d w k. BMC Bioinform, 4, 23. [12] L G, N F d, D h K, d B j T k. L b b d f k. J.Mach.Learn.Res., 3:679 77, 23. [13] Zh G, Zh f Zh, Sh h Zh, Y Ch, d Y h G. K w d d v y f w k. ICDM 29, 8 85. [14] Zh G, Sh h Zh, Y Ch, Zh f Zh, d Y h G. A d f k d d. I Proceedings of SIGIR 29. [15] Th J h, Th J h d D, N C, d N d. Rhb. A. Uk. C k f hy. I In Proceedings of ICML 21, pages = 25 257,. [16] A.D. K, N.P z j, d I. J. P d v b d. Bioinformatics, 2, 24. [17] P L f d d S v H v h. E w k f dy h h b w d. BMC Systems Biology, 1:54, 27. [18] P L f d, R L, M h C. O dh, d S v H v h. I My N w k M d P v d d R d b? PLoS Comput Biol, 7, 211. [19] D D. L d H. S b S. A h f v f z. I Advances in Neural Information Processing Systems 13, 21. [2] D D. L d H. S b S. A h f v f z. I Advances in Neural Information Processing Systems, 556C562, 21. [21] J. B. P L, A. J. E h, d C. A. O z. D f f d f w k. Proteins, 54(1):49 57, 24. [22] A.W. R v d T. G k. M d z f w k. Proc. Natl Acad. Sci., 1, 23. [23] P. T. S, G. Sh k, M. Q. Zh, V. R. Iy, K. A d, M. B. E, P. O. B w, D. B, d B. F h. C h v d fi f y d f h y S h y v by y hyb d z. 9:3273 3297, 1998. [24] V. S d L. A. M y. P d f d w k. Proc Natl Acad Sci U S A, 1(21):12123 12128, 23. [25] Ch S k, B bby J B k z, T R y, L B h, A h B k z, d M k Ty. B d: y f d. Nucleic Acids Research, 34:D535 D539. [26] A d S h, J yd Gh h, d C C d. C b k w d f w k f b. Journal of Machine Learning Research, 3:583 617, 22. [27] U k v L b. A. Statistics and Computing, 17, 27. [28] C. v M, R. K, B. S, M. C, S. G. O v, S. F d, d P. B k. C v f d f. Nature, 417(6887):399 43, M y 22. [29] D d W, T L, Sh h Zh, d Ch H. Q. D. M d z v v y d y f z. I SIGIR, 37 314, 28. [3] D d W, Sh h Zh, T L, d Y h G. M d z b d d. I A- CL/AFNLP, 297 3, 29. [31] Y Y, S S y, d R y d Gh. A dy f h hy z. Journal of Intelligent Information Systems, 18:219 241, 22. [32] Y Zh, E Z, T L, d G N h. W h d f d fy f d w k. I Proceedings of the 29 International Conference on Machine Learning and Applications, ICMLA 9, 539 544, 29. [33] Sh h Zh, K Y, Y Ch, d Y h G. C b d k f fi f z. I Proceedings of the 3th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7, 487 494, 27. 119