Hebbian Learning. Donald Hebb ( )

Hebbian Learning Donald Hebb (1904 1985)

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949)

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949) Fire together, wire together

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949) Fire together, wire together Simple model of Hebbian learning: x 1 x 2 x 3 w 1 y x N w N In the following we will drop µ for convenience. w i = η 1 M µ y µ x µ i (1) = η y µ x µ i µ (2) w = η y µ x µ µ. (3)

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. x 2 1 w -2-1 1 2 x 1-1

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. x 2 1 w y 4 2 0-2 -1 1 2 x 1-2 -4-1

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. y 4 2 0-2 -4

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10) Hebbian learning with a linear model neuron can be interpreted as an iterated multiplication of the weight vector w with matrix (I + η xx T ).

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10) Hebbian learning with a linear model neuron can be interpreted as an iterated multiplication of the weight vector w with matrix (I + η xx T ). The sign of the input vectors x is irrelevant for the learning.

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 1 w 1-2 -1 1 2 x 1-1 1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 w 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w 3 x 2 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w * x 2 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w * x 2 2 1-2 -1 1 2 x 1-1 Problem: The weights grow unlimited.

Explicit Normalization of the Weight Vectors First apply the learning rule, w t+1 i = w t i + ηf i with, e.g.,f i = yx i. (11) Then normalize the weight vector to length one, We can verify that i ( ) w t+1 2 (12) i = w t+1 i = ( i i ( j j w t+1 w t+1 i w t+1 j w t+1 j ) 2 ) 2. (12) 2 = i ( j ( ) w t+1 2 i w t+1 j ) 2 = 1. (13) Problem: Explicit normalization is expensive and biologically implausible.

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η η =0 (14)

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η = w t+1 i (η = 0) + η f i j η =0 ( ) 2 wj t + η f j (w t i + η f i ) j ( w t j + η f j ) 2 ( ) f j wj t +η f j j ) (w j t 2 +η f j j (14) (15) η =0

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η = w t+1 i (η = 0) + η f i j η =0 ( ) 2 wj t + η f j (w t i + η f i ) j ( w t j + η f j ) 2 ( ) f j wj t +η f j j ) (w j t 2 +η f j j (14) (15) η =0 = wi t + η f i wi t f j wj t. j (16)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i j

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) = η yx i w i y x j w j j }{{} =y j j (18)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19) w = η (yx y 2 w). (Oja s rule) (20)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19) w = η (yx y 2 w). (Oja s rule) (20) This rule is simpler, e.g. w i does not depend on w j anymore.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. w ηyx, (21)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have w ηy 2 w, (22)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have w ηy 2 w, (22) which results in a shortening of the weight vector.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule. Question: What will be the final length of the weight vector?

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule. Question: What will be the final length of the weight vector? Question: How will the weight vector develop under Oja s rule in general?

Linear Stability Analysis y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule)

Linear Stability Analysis y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ =? 2. What are the stationary weight vectors of the dynamics? 3. Which weight vectors are stable? 0! = w µ w =? w =? stable?

What is the mean dynamics of the weight vectors? y (4) = w T x (linear unit) ( w) µ (20) = η(y µ x µ (y µ ) 2 w) (Oja s rule)

What is the mean dynamics of the weight vectors? y (4) = w T x (linear unit) ( w) µ (20) = η(y µ x µ (y µ ) 2 w) (Oja s rule) In order to proceed analytically we have to average over all training patterns µ and get ( w) µ µ = 1 M (20) = 1 M M ( w) µ (23) µ=1 M η(y µ x µ (y µ ) 2 w) (24) µ=1 w µ = η yx y 2 w µ (25) For small learning rates η this is a good approximation to the real training procedure (apart from a factor of M).

What is the mean dynamics of the weight vectors? y (4) = w T x, (linear unit) (25) w µ = η yx y 2 w µ. (Oja s rule, averaged)

What is the mean dynamics of the weight vectors? y (4) = w T x, (linear unit) (25) w µ = η yx y 2 w µ. (Oja s rule, averaged) 1 η w µ (25) = yx y 2 w µ (26) (4) = (w T x)x (w T x)(w T x)w µ (27) = x(x T w) (w T x)(x T w)w µ (28) = (xx T )w (w T (xx T )w)w µ (29) = xx T µ w (w T xx T µ w)w (30) [ C := xx T µ = C T ] (31) (31) = Cw ( w T Cw ) w (32)

y Interim Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 3. Which weight vectors are stable? 0! = w µ w =? w =? stable?

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics)

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36)

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36) Stationary weight vectors w are eigenvectors of matrix C.

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36) Stationary weight vectors w are eigenvectors of matrix C. λ (35) = w T Cw (36) = w T λw = λw T w = λ w 2 (37) 1 = w 2 (38) Stationary weight vectors w are normalized to 1.

y Interim Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 0! = w µ w = c α 3. Which weight vectors are stable? with Cc α (36) = λ α c α (eigenvectors of C) 1 (38) = c α 2 (with norm 1) w =? stable?

Reminder: Eigenvalues and Eigenvectors Aa = λa (eigenvalue equation) (39) Solutions of the eigenvalue equation for a given quadratic N N-matrix A are called eigenvectors a and eigenvalues λ. For symmetric A, i.e. A T = A, eigenvalues λ are real, and eigenvectors to different eigenvalues are orthogonal, i.e. λ α λ β a α a β. A symmetric matrix A always has a complete set of orthonormal eigenvectors a α, α = 1,..., N (orthonormal basis), i.e. Aa α = λ αa α, (right-eigenvectors) (40) a αt A = (Aa α ) T = a αt λ α, (left-eigenvectors) (41) a α = a αt a α = 1, (with norm 1) (42) a αt a β = 0 α β. (orthogonal) (43) N v v = v αa α mit v α = a αt v (complete) (44) = v v = α=1 N a α a αt v 1 = α=1 v 2 = i v 2 i (42, 43, 44) = α v α 2 N a α a αt (45) α=1 (46)

Reminder: Eigenvalues and Eigenvectors A T = A, (symmetric) (47) Aa α = λ α a α, (eigenvectors) (48) a αt a β = δ αβ α, β. (orthonormal) (49) x 2 a β β λ β a Ax x a α α λ α a x 1

Which weight vectors are stable? 1 η w (32) µ = Cw ( w T Cw ) w (50)

Which weight vectors are stable? 1 η w (32) µ = Cw ( w T Cw ) w (50) Ansatz: w := c α + ɛ (with small ɛ) (51) c β w < ε>? ε c α How does ɛ develop under the dynamics?

Which weight vectors are stable? 1 η w µ (32) = Cw ( w T Cw ) w w := (51) c α + ɛ (with small ɛ) 1 η ɛ µ (51) = 1 w µ (52) η (32) = Cw (51) = C(c α + ɛ) ( ) w T Cw w (53) ( ) (c α + ɛ) T C(c α + ɛ) (c α + ɛ) (54) Cc α + Cɛ (c αt Cc α )c α (c αt Cɛ)c α (ɛ T Cc α )c α (c αt Cc α )ɛ (55) (36,41) = λ α c α + Cɛ (c αt λ α c α )c α (c αt λ α ɛ)c α (ɛ T λ α c α )c α (c αt λ α c α )ɛ (56) (38) = λ α c α + Cɛ λ α c α λ α (c αt ɛ)c α λ α (ɛ T c α )c α λ α ɛ (57) = Cɛ 2λ α (c αt ɛ)c α λ α ɛ. (58)

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ.

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ. For simplicity consider the change of the perturbation along the eigenvector c β. c β w < ε> ε c β T < ε>? c βt ε c α

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ. For simplicity consider the change of the perturbation along the eigenvector c β. 1 η cβ T ɛ µ (58) c β T Cɛ 2λ α(c αt ɛ)(c β T c α ) λ α(c β T ɛ) (59) (36,41) = λ β (c β T ɛ) 2λ α(c αt ɛ)(c β T c α ) λ α(c β T ɛ) (60) (42,43) = λ β (c β T ɛ) 2λ α(c αt ɛ)δ βα λ α(c β T ɛ) (61) = λ β (c β T ɛ) 2λ α(c β T ɛ)δ βα λ α(c β T ɛ) (62) = β 2λ αδ βα λ α) (c β T ɛ) (λ { (63) = ( 2λ β )(c β T ɛ) if β = α (λ β λ α)(c β T ɛ) if β α (64)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α.

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69) { η( 2λ κ αβ := β ) if β = α (70) η(λ β λ α ) if β α

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69) { η( 2λ κ αβ := β ) if β = α (70) η(λ β λ α ) if β α we get (64) (68, 69, 70) s β µ κ αβ s β. (71)

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α β = / α λ β < λα c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α κ αβ = η(λ β λ α ) < 0, β = / α λ β < λα κ < 0 αβ c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α κ αβ = η(λ β λ α ) < 0, The perturbation along c β decays. β = / α λ β < λα κ < 0 αβ c β -Richtung c β < s > β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α β = / α λβ = λ α c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α κ αβ = η(λ β λ α ) = 0, β = / α λβ = λα κ αβ = 0 c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α κ αβ = η(λ β λ α ) = 0, The perturbation along c β persists. c β β = / α λβ = λ κ αβ = 0 α < s > β c β -Richtung s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α β = / α λβ > λ α c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α κ αβ = η(λ β λ α ) > 0, β = / α λβ > λα κ > 0 αβ c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α κ αβ = η(λ β λ α ) > 0, The perturbation along c β grows. c β β = / α λβ > λα c β -Richtung κ αβ> 0 < s β> ε s β c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α c β β = α ε c α -Richtung c α s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α κ αβ = η( 2λ β ) < 0. c β β = α κ < 0 αβ ε c α -Richtung c α s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α κ αβ = η( 2λ β ) < 0. The perturbation along c α always decays (if λ β > 0). c β β = α κ < 0 αβ ε c α s α c α -Richtung < > s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Cases 3 and 4 combined. The weight vectors turns away c α into the c β -direction. c β β = α λ > λ β α < s > c β Richtung < ε > β s β c α ε s α c α Richtung < > s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay,

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72)

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72) Only the eigenvector c 1 with largest eigenvalue λ 1 is a stable weight vector under Oja s rule.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72) Only the eigenvector c 1 with largest eigenvalue λ 1 is a stable weight vector under Oja s rule. What happens if the two largest eigenvalues are equal?

y Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 0! = w µ w = c α 3. Which weight vectors are stable? with Cc α (36) = λ α c α (eigenvectors of C) 1 (38) = c α 2 (with norm 1) w = c α stable (72) λ β < λ α β α

Reminder: Principal Components c 2 c 1 Principal components are eigenvectors of the covariance matrix and point in the direction of maximal variance within the space orthogonal to the earlier principal components.

Learning Several Principal Components x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N (Rubner & Tavan, 1989, Europhys. Letters 10:693 8; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

Learning Several Principal Components x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N x 1 x 2 x 3 x N y 1 y y 2 3 j 1 y j = wj T x + v jk y k k=1 w j = η(y j x yj 2 w j ) v jk = ɛy j y k Asymmetric inhibitory lateral connections decorrelate later from earlier output units. The units learn the principal components in order of decreasing eigenvalue. (Rubner & Tavan, 1989, Europhys. Letters 10:693 8; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

Learning a Principal Subspace x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N x 1 x 2 x 3 x N y 1 y y 2 3 y j = w T j x + N v jk y k k=1 k j w j = η(y j x yj 2 w j ) v jk = ɛy j y k Symmetric inhibitory lateral connections mutually decorrelate output units. The units learn the principal subspace but not particular principal components. (Földiák, 1989, Proc. IJCNN 89 pp. 401 405; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

The Principal Components of Natural Images 15 natural images of size 256 256 pixels. 20,000 random samples of size 64 64 pixels. For each pixel the mean gray value over the 20,000 samples was removed. The samples were windowed with a Gaussian with std. dev. 10 pixels. Sanger s rule was applied to the samples. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)

The Principal Components of Natural Images The first principal components resemble simple-cell receptive fields in the primary visual cortex, the later ones do not. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)

The Principal Components of Natural Images The principal components look different if samples are taken from text. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)