Artificial Intelligence. 8. Inductive Logic Programming

Artificial Intelligence Artificial Intelligence 8. Inductive Logic Programming Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Economics and Information Systems & Institute of Computer Science University of Hildesheim http://www.ismll.uni-hildesheim.de Course on Artificial Intelligence, summer term 2007 1/11 Artificial Intelligence 1. Inductive Logic Programming 2. FOIL 3. Inverse Resolution Course on Artificial Intelligence, summer term 2007 1/11

Artificial Intelligence / 1. Inductive Logic Programming Inductive Logic Programming (ILP) Given some positive examples for a target predicate P, say daughter(mary, ann) daughter(eve, tom) and some negative examples daughter(tom, ann) daughter(eve, ann) as well as some descriptive predicates Q of the entities envolved female(ann) female(eve) parent(ann, mary) parent(tom, eve) find a hypothesis definition / rule of P in terms of Q that 1. covers all the positive examples, 2. does not cover any negative example, and 3. is sufficient general. Course on Artificial Intelligence, summer term 2007 1/11 Artificial Intelligence / 1. Inductive Logic Programming Trivial Solutions daughter(x, Y ) covers all positive examples, but unfortunately also all negative examples. false covers no negative example, but unfortunately also no positive example. (X = mary Y = ann) (X = eve Y = tom) daughter(x, Y ) covers all positive examples, covers no negative example, but unfortunately does not generalize (new examples will fail). Course on Artificial Intelligence, summer term 2007 2/11

Artificial Intelligence / 1. Inductive Logic Programming Two principal approaches: top-down: generalization of decision trees (FOIL). inverse deduction (inverse resolution). Course on Artificial Intelligence, summer term 2007 3/11 Artificial Intelligence 1. Inductive Logic Programming 2. FOIL 3. Inverse Resolution Course on Artificial Intelligence, summer term 2007 4/11

FOIL First Order Inductive Learner (FOIL; Quinlan 1990). Idea: iteratively build rules that cover some positive examples, % U U «ŒŽ ƒ ˆ<S Œ@ kœ< 4 s b tš < IR} t kœžœ@ k Š ¹ŝ < Ž i ±Ÿ ŒsŸ O ŒŽ kœ ŒŽ s ŒŽ Lˆ@ k Š,H ŒŽ ±Ÿ ~% Œ> ª ŝ k, Hƒ ŝ k qŝ Œ/ ˆ-,HŒ)ŝ Œ/ ŒŽ ŒŽ k /Œ,H k R qŝ E ªiöŝ k, t k,9 ŒŽ < %MŸ'&Ÿ'&Ÿ4} g Ž k g s b kœžœ ƒ ŝ k Ë ŒŽ Ž kœ R k ŽŒŽ k k s O ŝ Œ0/ ˆ-,H ŒŽ Ž b J s ŒŽ k H Ž kš < ŝ R J s ŒŽ k O J R k ŽŒŽ k k Ÿ ŒŽ ˆ «Œ)Œ0/ ˆ-,H ŒŽ.ŝ kœ) s ŒŽ ŽŒŽ k ŝ k «ŒŽ > ª Ÿ'~% Œ<, ˆq ƒjœg ŒŽ ŒŽ ˆ ŒŽ ) A kœ R k ŽŒŽ k k ƒ ŝ kœž ) A Œg Ž kœž IR} U k /ŝ k kš,h M \Ÿ O J s ŒŽ k L Ž kš < ) $> ª ƒ ŝ k qŝ Œ ˆ-,HŒ4ŝ O Œ Æ¾ºs¼<Á,»³ ; s 2L k, %MŸ'&ŸË R k ŽŒŽ k k A s ' Œ@ Š ŽŒŽ J s ŒŽ k g ˆ t?,t s ' kœž Š ŽŒŽ ŒŽ k k kš Ïƒ k Œ 8,HŒŽ ŒŽ ŒŽ < %MŸ'&Ÿ'& ŝ ŒŽ k Ž k ƒjœž / A k,hœ,h kœ> Œ< Æŝ \ ) ŒŽ < BMŸ % ŝ BMŸ ž Ÿ ~% Œ/ Æ¾ºs¼<Á,»³ ) s Œ > ª ŝ k, 2L k, ž Ÿ š ŝ k kš,hœž but no negative ones. Once a rule has been found, remove the positive examples covered and proceed. to build a rule: add literals to the body until no negative example is covered if literals introduce new variables, extend example tuples by all possible constants. Course on Artificial Intelligence, summer term 2007 4/11 Œ> «ŒŽ kœ R k ŽŒŽ k kœž U kœ< % s Ï ŝ Œ0/ ˆ-,H ŒŽ U Œ> ŝ Š ŝ Œƒ ŝ s i tš < IR} t kœžœ> k ˆ-, Ž ŝš kœž Ž M ŒŒ0/ ŒŽ k ŝ \ƒ ŝ Æ k Š / O R ŒŽ Œ L iŝ L UŒŽ Eŝ L ŒË kœqŝ k Æ FOIL / Algorithm ŒŽŠ k Ž ŝ ; Œ@ ŒŽŠ k Ž LŠ kœž (1/2) ŝ > Œ Ž k ŒŽ k ˆMŸ 7#9 J"= : H J 8 I 9 7#9 J"= `, K `,.0/12/34 (;E=?U @ >( * `!, ) &% 24.% &24 4 $SV 7Q N( *, @ `, Z R? <(;? #$E=< @N?? =A?U b N@, 7 7 N N@ - N X b `, 56#$E= =M;: #$! 6(;E=? 7W 'D? # N@, N X?U @$ ` 240 ( + @ ` `, YZIV4 B\[?- (;<7 @$(;?U 6=< Ä 7), ^ Y_4E1Y_4ba F, [Lavrac/Dzeroski 1994] Course on Artificial Intelligence, summer term 2007 5/11

U S FOIL / Algorithm (2/2) % U U 7#9 J"= P4 H J 8F;@ I #7: # 7#9 J"= M <( )?U @ `, M(*?? 6( * `, `, G B\[ /?!"# $&% c 7 " 6?U (' 7 7Q N "<7N'!*) 78(;?U* (; S `!*) ',?U# N X <( )?U @ S 6 N 6';:< b' > *! 6 Ä - Ä 'D+', @ ` S-, (;? #$, /<I % G B\[ / ^ Y_4E1Y_4 *, Ž «ŒŽ k ; Œ kœ< s @ Š ŒŽ }Ÿ ~% Œ ŒŽ ŝ @ )ŒŽ ŒŽ ;ŝ ˆ, s Œ t?, / @SG1U G l l l4g > k@>+ 4ŸA,HŒ s U Œ «sŝ k R ŝƒ ŒŽ @SG1U G l l l4g 4ƒJŒŽ A Œ m«sŝ k ŝƒ ŒŽ gŝ kœqŝ M Ž ŽŠ k k / 3} 5/.cSG l l l4g-.10(2 34 O Œ% k,hœoŝ kœ Œ< 1/4cSG l l l4g-415 #6 ±Ÿ ŒsŸ IR k Š ŽŒŽ ƒ Œ ŒŽ ŝ ŸE~% Œ kœ< 1 s m Š ŒŽ SE Ž «ŒŽ kœž ƒ H Ž ŝš kœ.3 S O Œ4 kœ< > s k Š.&h : 496 R} Š ŒŽ Æŝ ˆ % s D7.cSG l l l4g.10(2 3 G84cSG l l l4g-415 #6dI FOIL g t 4 O Æ Œ ƒj M / Example SGbU+G l l l4g SG% kš ŒsŸ/ k Š Ž )ˆ/ Œ< G ŝ / kœ< Sgŝ Ëˆ/ kœ< Ë s 1Œ0/ ŒŽ k Ë s 1 Œ Š ŒŽ ˆ R ˆ» qŝ 4ƒJŒbŒ0/M kœž k kœž g Œ' kœž ˆ ŝ ŝ ŒŽƒ ˆ1 ŒŽ R,H s ƒ ˆq @ ˆ SU U Œ ˆ Š ŝ k s % U Ï O. Œg kœž U ˆ Ž k kœž k J @ H ŒŽ ŝ }Ÿ 1N$NO6!L3h ~Rŝƒ Œž Ÿ šg Š ˆ ŒŽ U Œ> ca+6&3 S1 E:* 580[M 6 N ª G k JŒŽ Ž ŝ ˆ ŝ k, A Œ ¹ˆ-, R ; kœž!s =? N!G c ˆ g k ƒ Œ,T t k, ŒŽ < š Ÿ ž Ÿ.~% ŒH kœqŝ k Æ Æŝ g O S C& 3 S šhl ** Ž ŝš kœ 3 S :* 580[M 6 B6HG >'= 6 N G 7ŸL~% Œ@ ŝ \ qŝ Ï ŝ H kœ<!so Ž Æŝ >ŝ S C& ŝ Œ0/ ˆ-,H ŒŽ Ž >'=WG c U s m O Æ S c6 =?*h ŝ kœo J k «Œ 6 S C& Eŝ U 4 ŒŽ ˆ «Œ S 6 B6HG c &,Ÿ 2U k A ŒŽ ŝ 47 S S c6 =?*h *Ml Bsž 6 > «ŒŽ g k kœë Œ@ Œ< G R C& qŝ \ ŝ @ kœ< 1N$NO6!L3h ca+6&3u E:* 580[M 0U s R Š ŒŽ O Ž «ŒŽ kœž ;ƒ?3u O Æ/ Ž Æŝ % 6 N G c6 =?*h 6 U J k «Œ 0U U =? N!G c & Lŝ Œ@ ŒŽ ˆ «ŒË Š Œ U š,ÿ U 2L. Œ@ kœž Ž C& 3Ü *Ml'/-B ŒŽ ŝ bu ga1 NO6! 6 B6HG >'= 2G %. ŒËƒJ M ) s b Ž ŝš kœ 3U k Š ŽŒŽ Ž U š ŝš kœ 3 > O ˆ qŝ ŝ kœ< 6 B6HG c 1 Ž Æŝ J k «ŒO Š ŒŽ ŽŸ ~%Š Ž ˆ Ž k ŒŽ bu a1 NO6! 2G 47 bu šhl š+1 C& 1N$NO6!L3h ca+6&3 E:* 580[M 6 N G c6 =?*h 6 G7a1 NO6! 2G =? N!G c C& 3 *Ml ** 6 B6HG >'= [Lavrac/Dzeroski 1994] Course on Artificial Intelligence, summer term 2007 6/11 * ~Rŝƒ Œgž Ÿ š > ª ŝ ŽŒ t 1 Œ ¹ˆ-,H H kœž ˆ k ƒ Œ,/Ÿ Ž ŝš kœ#3 O Æ/ Ž «ŒŽ k H ŒŽ ˆ «Œ> Š ŒŽ 1 % ŒŽ ŒŽ ˆ ŒŽ \Ÿ } ˆ/ J k «ŒH ŒŽ ŝ E O Œ< Œ0/M ŒŽ ŝ OŠ ŝ S ŒŽ «sŝ k ŝƒ ŒŽ g ŝ ŒŽ / Œ.ƒJ M s % Œ Ž ŝš kœs E Œ k Œ ŝ k g s 1 Œ Š ŒŽ 4 Œ [Lavrac/Dzeroski 1994] Course on Artificial Intelligence, summer term 2007 7/11

Ÿ S ž ž FOIL / Example % U U 1N$NO6!L3h ca+6&3 S1 E:* 580[M 6 N G!S =? N!G c S & 6 B6HG >'= S & >'=WG c S ]a1 NO6! 2G 6 B6HG c C& 1N$NO6!L3h ca+6&3u E:* 580[M 6 N G a1 NO6! 2G 0U =? N!G cegt=? N U =? N!G ceg >'= 6 B6HG >'=WG6 B6 6 B6HG >'=WG 4< >'=WG cegt=? N U >'=WG ceg >'= 6 B6HG cegt=? N 6 B6HG ceg >'= ~Rŝƒ Œ ž Ÿ'& ~% ŒAŒ Š ŒŽ O > ª Ÿ ŒŽ < H s Œ< «sŝ k ŝƒ ŒŽ Ë Œ ŽŠ k kœž ŝ ; kœ< s [Lavrac/Dzeroski 1994] ŒŽ S } S ŝ S ŸH UŒ<«ŒŽ Ž R U Œ< «sŝ k ŝƒ ŒŽ ŝ kœ. k Š ŽŒŽ \ Ë qŝ ŝ JŒŽ ˆ S I ŝ &) & S I Ÿ } s > Œ) J k «ŒA Š ŒŽ >ŝ kœ) kœž kœž kœž ŒŽ ƒ Œ).,H kœa ŠIR ŒŽ A S, L Œ M t?, Literal ˆ ŝ ŒŽ Selection ƒ kœž ŒŽ < ŒŽ ŝ 4ˆ-,H Š 47 3 SG3 3 3 S,Ÿ ~% Œ/ ŒŽ ŝ O Œ/ ŒŽ H ŝ kœž ŒŽ < ŒŽ ˆ HŒqŝ Æ ŒŽ \Ÿ 3 SG3 Ë Æŝ t ' UŒŽ ŒŽ M t?, ˆ @ ŝ \ ±Ÿ ŒsŸ ŒË»³ "q¾á,ãhâs ¹ ±¾³ Âs»³ 3 3 S!U¼< Ç ¼ ƒ. Œ ¹ŝ < \ ŒbŒ0/ ˆ-,H Œs Š ˆ ŒŽ > g~rŝƒ Œbž Ÿ š! 3 S h >+8OU U U U šhl **M 3Ü h >+8OU U U S *Ml'/-BM mŝ 3 h >+8OU U U *Ml **MŸ 2L ` S n U & \ UŒ ˆq«Œ 47 S & 3 S i I(c i ) := log 3Ü & *Ml ž& *Ml Bsž ŝ / k P,H ŝ k 47 bu & *Ml'/-B 2 nšhl š+1mÿ i + n i ~% Œ. ŒŽŠ k.ŝ O g t 4Œ('H Ž ŒŽ @ kš R O Æ kœž k J k ƒ ŒH t Œ/Œ('H Ž ŒŽ < s > ª ŸbªÏŒ< LƒJŒ/ˆ J k «Œ ŒŽ ŝ U O 47 3 3 S,Ÿ, ˆq Ž Æŝ Œ< «sŝ k ŝƒ ŒŽ ŝ ŒŽ @ kœ R ŝ ŽŒ,HŒŽ O «sŝ k ŝƒ ŒŽ > qŝ Œ<«ŒŽ Ž kœqŝ kœ Ÿ / kœž «ŒŽ Ž kš Æ kœž ŝ ŽŒ,HŒŽ qŝ ˆ ƒjœž k Š ŽŒ Â kœ< S> Ž Æŝ A Š ŒŽ Ž Course on Artificial Intelligence, summer term 2007 8/11 Let n i be the number of positive examples in step i, n i be the number of negative examples in step i. information: If the new literal does not introduce new variables, n i+1 n i and n i+1 n i. But if new variables are introduced, this may not hold anymore. Denote by n i at least one tuple in E i+1. weighted information gain: the number of positive tuples in E i represented by WIG(L i, c i ) := WIG(c i+1, c i ) := n i (I(c i ) I(c i+1 )) Select the literal with the highest weighted information gain. Course on Artificial Intelligence, summer term 2007 9/11

Artificial Intelligence 1. Inductive Logic Programming 2. FOIL 3. Inverse Resolution Course on Artificial Intelligence, summer term 2007 10/11 Artificial Intelligence / 3. Inverse Resolution Resolution: Given clauses C 1 and C 2, infer resolvent C. C 1 := C 1 {R}, C 2 := C 2 { R }, Rθ = R θ C := C 1θ C 2θ Inverse resolution: Given resolvent C and clause C 1, infer clause C 2. C 1 := {R} C 2 := { R } C, R θ = Rθ, C θ = Cθ Parent(x,z) > Parent(z,y) > Grandparent(x,y) Parent(George,Elizabeth) {x/george, z/elizabeth} Parent(Elizabeth,y) > Grandparent(George,y) Parent(Elizabeth,Anne) {y/anne} Grandparent(George,Anne) Grandparent(George,Anne) Course on Artificial Intelligence, summer term 2007 10/11

Artificial Intelligence / 3. Inverse Resolution Inverse resolution is a search, as there may be many pairs of clauses leading to resolvent C: Parent(Elizabeth, Anne) Grandparent(George, Anne) Parent(z, Anne) Grandparent(George, Anne) Parent(z, y) Grandparent(George, y)... Many techniques available for narrowing search space: eliminate redundancies, e.g., by generating only the most specific hypothesis. restrict proof strategy, e.g., to linear proofs. restrict representation language, e.g., to Horn clauses. use different inference method, e.g., model checking or ground propositional clauses. Course on Artificial Intelligence, summer term 2007 11/11