Hebbian Learning. Donald Hebb ( )

Σχετικά έγγραφα
Numerical Analysis FMN011

Homework 3 Solutions

Homework 8 Model Solution Section

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

w o = R 1 p. (1) R = p =. = 1

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

EE512: Error Control Coding

2 Composition. Invertible Mappings

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Example Sheet 3 Solutions

The Simply Typed Lambda Calculus

Statistical Inference I Locally most powerful tests

Matrices and Determinants

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

( ) 2 and compare to M.

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Second Order Partial Differential Equations


Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Concrete Mathematics Exercises from 30 September 2016

C.S. 430 Assignment 6, Sample Solutions

( y) Partial Differential Equations

5.4 The Poisson Distribution.

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Approximation of distance between locations on earth given by latitude and longitude

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

TMA4115 Matematikk 3

Second Order RLC Filters

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Srednicki Chapter 55

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Other Test Constructions: Likelihood Ratio & Bayes Tests

Solutions to Exercise Sheet 5

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

Notes on the Open Economy

The ε-pseudospectrum of a Matrix

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

Reminders: linear functions

상대론적고에너지중이온충돌에서 제트입자와관련된제동복사 박가영 인하대학교 윤진희교수님, 권민정교수님

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

derivation of the Laplacian from rectangular to spherical coordinates

Section 8.3 Trigonometric Equations

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Section 7.6 Double and Half Angle Formulas

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Inverse trigonometric functions & General Solution of Trigonometric Equations

Every set of first-order formulas is equivalent to an independent set

Solution Series 9. i=1 x i and i=1 x i.

Tridiagonal matrices. Gérard MEURANT. October, 2008

Quadratic Expressions

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

The challenges of non-stable predicates

EE101: Resonance in RLC circuits

4.6 Autoregressive Moving Average Model ARMA(1,1)

Parametrized Surfaces

General 2 2 PT -Symmetric Matrices and Jordan Blocks 1

Πανεπιστήµιο Κρήτης - Τµήµα Επιστήµης Υπολογιστών. ΗΥ-570: Στατιστική Επεξεργασία Σήµατος. ιδάσκων : Α. Μουχτάρης. εύτερη Σειρά Ασκήσεων.

Math221: HW# 1 solutions

Congruence Classes of Invertible Matrices of Order 3 over F 2

Μηχανική Μάθηση Hypothesis Testing

Problem Set 3: Solutions

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

The Jordan Form of Complex Tridiagonal Matrices

Block Ciphers Modes. Ramki Thurimella

g-selberg integrals MV Conjecture An A 2 Selberg integral Summary Long Live the King Ole Warnaar Department of Mathematics Long Live the King

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

On a four-dimensional hyperbolic manifold with finite volume

Finite difference method for 2-D heat equation

Lecture 34 Bootstrap confidence intervals

Probability and Random Processes (Part II)

Lecture 10 - Representation Theory III: Theory of Weights

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Higher Derivative Gravity Theories

Testing for Indeterminacy: An Application to U.S. Monetary Policy. Technical Appendix

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

Areas and Lengths in Polar Coordinates

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Lecture 13 - Root Space Decomposition II

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Forced Pendulum Numerical approach

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

ΜΕΤΑΠΤΥΧΙΑΚΗ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ «ΘΕΜΑ»

6.3 Forecasting ARMA processes

Math 6 SL Probability Distributions Practice Test Mark Scheme

Fractional Colorings and Zykov Products of graphs

D Alembert s Solution to the Wave Equation

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Συστήματα Διαχείρισης Βάσεων Δεδομένων

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

CE 530 Molecular Simulation

the total number of electrons passing through the lamp.

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

Transcript:

Hebbian Learning Donald Hebb (1904 1985)

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949)

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949) Fire together, wire together

Hebbian Learning When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. (Donald Hebb, 1949) Fire together, wire together Simple model of Hebbian learning: x 1 x 2 x 3 w 1 y x N w N In the following we will drop µ for convenience. w i = η 1 M µ y µ x µ i (1) = η y µ x µ i µ (2) w = η y µ x µ µ. (3)

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. x 2 1 w -2-1 1 2 x 1-1

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. x 2 1 w y 4 2 0-2 -1 1 2 x 1-2 -4-1

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. y 4 2 0-2 -4

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. y 4 2 0-2 -4

Linear Model Neuron A linear model neuron is described by: y = N w i x i = w T x. (4) i=1 This corresponds to a projection of the data onto the axis given by w scaled with w. y 4 2 0-2 -4

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10)

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10) Hebbian learning with a linear model neuron can be interpreted as an iterated multiplication of the weight vector w with matrix (I + η xx T ).

Hebbian Learning with a Linear Model Neuron w = (3) η y x (Hebbian learning) y (4) = w T x, (linear neuron) w = (3) η yx (5) (4) = η (w T x)x (6) = η x(x T w) (7) = η xx T w (8) = w t+1 = w t + η xx T w t (9) = ( I + η xx T ) w t (10) Hebbian learning with a linear model neuron can be interpreted as an iterated multiplication of the weight vector w with matrix (I + η xx T ). The sign of the input vectors x is irrelevant for the learning.

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 1 w 1-2 -1 1 2 x 1-1 1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 1 w 1-2 -1 1 2 x 1-1 1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 w 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) x 2 2 w 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w 3 x 2 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w * x 2 2 1-2 -1 1 2 x 1-1

Illustration of Hebbian Learning w = (6) η (w T x)x (Hebbian learning for a linear unit) w * x 2 2 1-2 -1 1 2 x 1-1 Problem: The weights grow unlimited.

Explicit Normalization of the Weight Vectors First apply the learning rule, w t+1 i = w t i + ηf i with, e.g.,f i = yx i. (11) Then normalize the weight vector to length one, We can verify that i ( ) w t+1 2 (12) i = w t+1 i = ( i i ( j j w t+1 w t+1 i w t+1 j w t+1 j ) 2 ) 2. (12) 2 = i ( j ( ) w t+1 2 i w t+1 j ) 2 = 1. (13) Problem: Explicit normalization is expensive and biologically implausible.

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η η =0 (14)

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η = w t+1 i (η = 0) + η f i j η =0 ( ) 2 wj t + η f j (w t i + η f i ) j ( w t j + η f j ) 2 ( ) f j wj t +η f j j ) (w j t 2 +η f j j (14) (15) η =0

Implicit Normalization of the Weight Vector Explicit normalization yielded w t+1 i (η) (12) = wi t + ηf i ( ). 2 wj t + ηf j j To optain a simpler/more plausible normalization rule, we compute the Taylor-expansion of w t+1 i in η at η = 0 under the assumption that w t is normalized, i.e. (wj t)2 = 1, j t+1 w t+1 i (η) w t+1 i (η wi (η ) = 0) + η η = w t+1 i (η = 0) + η f i j η =0 ( ) 2 wj t + η f j (w t i + η f i ) j ( w t j + η f j ) 2 ( ) f j wj t +η f j j ) (w j t 2 +η f j j (14) (15) η =0 = wi t + η f i wi t f j wj t. j (16)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i j

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) = η yx i w i y x j w j j }{{} =y j j (18)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19) w = η (yx y 2 w). (Oja s rule) (20)

Implicit Normalization of the Weight Vector Taylor-expansion of the explicit normalization rule resulted in (16) wi t + η f i wi t f j wj t. w t+1 i For Hebbian learning, i.e. with f i = yx i and y = j x jw j, this becomes (16) w i = η f i w i f j w j (17) j j = η yx i w i y x j w j (18) j }{{} =y = η ( yx i y 2 ) w i i (19) w = η (yx y 2 w). (Oja s rule) (20) This rule is simpler, e.g. w i does not depend on w j anymore.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. w ηyx, (21)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have w ηy 2 w, (22)

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have w ηyx, (21) which corresponds to Hebb s rule. For large y we have w ηy 2 w, (22) which results in a shortening of the weight vector.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule.

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule. Question: What will be the final length of the weight vector?

Oja s Rule Implicit normalization yielded w (20) = η (yx y 2 w). (Oja s rule) For small y we have which corresponds to Hebb s rule. For large y we have which results in a shortening of the weight vector. w ηyx, (21) w ηy 2 w, (22) The second term only affects the length but not the direction of the weight vector. It therefore limits the length of the weight vector without affecting the qualitative behavior of the learning rule. Question: What will be the final length of the weight vector? Question: How will the weight vector develop under Oja s rule in general?

Linear Stability Analysis y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule)

Linear Stability Analysis y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ =? 2. What are the stationary weight vectors of the dynamics? 3. Which weight vectors are stable? 0! = w µ w =? w =? stable?

What is the mean dynamics of the weight vectors? y (4) = w T x (linear unit) ( w) µ (20) = η(y µ x µ (y µ ) 2 w) (Oja s rule)

What is the mean dynamics of the weight vectors? y (4) = w T x (linear unit) ( w) µ (20) = η(y µ x µ (y µ ) 2 w) (Oja s rule) In order to proceed analytically we have to average over all training patterns µ and get ( w) µ µ = 1 M (20) = 1 M M ( w) µ (23) µ=1 M η(y µ x µ (y µ ) 2 w) (24) µ=1 w µ = η yx y 2 w µ (25) For small learning rates η this is a good approximation to the real training procedure (apart from a factor of M).

What is the mean dynamics of the weight vectors? y (4) = w T x, (linear unit) (25) w µ = η yx y 2 w µ. (Oja s rule, averaged)

What is the mean dynamics of the weight vectors? y (4) = w T x, (linear unit) (25) w µ = η yx y 2 w µ. (Oja s rule, averaged) 1 η w µ (25) = yx y 2 w µ (26) (4) = (w T x)x (w T x)(w T x)w µ (27) = x(x T w) (w T x)(x T w)w µ (28) = (xx T )w (w T (xx T )w)w µ (29) = xx T µ w (w T xx T µ w)w (30) [ C := xx T µ = C T ] (31) (31) = Cw ( w T Cw ) w (32)

y Interim Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 3. Which weight vectors are stable? 0! = w µ w =? w =? stable?

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics)

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36)

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36) Stationary weight vectors w are eigenvectors of matrix C.

What are the stationary weight vectors? 1 η w µ C (31) = xx T µ = C T, (symmetric matrix C) (32) = Cw ( w T Cw ) w. (weight dynamics) For stationary weight vectors we have: 0! = w µ (33) (32) Cw = ( w T Cw ) w (34) [ ( λ := w T Cw )] (35) (35) Cw = λw (36) Stationary weight vectors w are eigenvectors of matrix C. λ (35) = w T Cw (36) = w T λw = λw T w = λ w 2 (37) 1 = w 2 (38) Stationary weight vectors w are normalized to 1.

y Interim Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 0! = w µ w = c α 3. Which weight vectors are stable? with Cc α (36) = λ α c α (eigenvectors of C) 1 (38) = c α 2 (with norm 1) w =? stable?

Reminder: Eigenvalues and Eigenvectors Aa = λa (eigenvalue equation) (39) Solutions of the eigenvalue equation for a given quadratic N N-matrix A are called eigenvectors a and eigenvalues λ. For symmetric A, i.e. A T = A, eigenvalues λ are real, and eigenvectors to different eigenvalues are orthogonal, i.e. λ α λ β a α a β. A symmetric matrix A always has a complete set of orthonormal eigenvectors a α, α = 1,..., N (orthonormal basis), i.e. Aa α = λ αa α, (right-eigenvectors) (40) a αt A = (Aa α ) T = a αt λ α, (left-eigenvectors) (41) a α = a αt a α = 1, (with norm 1) (42) a αt a β = 0 α β. (orthogonal) (43) N v v = v αa α mit v α = a αt v (complete) (44) = v v = α=1 N a α a αt v 1 = α=1 v 2 = i v 2 i (42, 43, 44) = α v α 2 N a α a αt (45) α=1 (46)

Reminder: Eigenvalues and Eigenvectors A T = A, (symmetric) (47) Aa α = λ α a α, (eigenvectors) (48) a αt a β = δ αβ α, β. (orthonormal) (49) x 2 a β β λ β a Ax x a α α λ α a x 1

Which weight vectors are stable? 1 η w (32) µ = Cw ( w T Cw ) w (50)

Which weight vectors are stable? 1 η w (32) µ = Cw ( w T Cw ) w (50) Ansatz: w := c α + ɛ (with small ɛ) (51) c β w < ε>? ε c α How does ɛ develop under the dynamics?

Which weight vectors are stable? 1 η w µ (32) = Cw ( w T Cw ) w w := (51) c α + ɛ (with small ɛ) 1 η ɛ µ (51) = 1 w µ (52) η (32) = Cw (51) = C(c α + ɛ) ( ) w T Cw w (53) ( ) (c α + ɛ) T C(c α + ɛ) (c α + ɛ) (54) Cc α + Cɛ (c αt Cc α )c α (c αt Cɛ)c α (ɛ T Cc α )c α (c αt Cc α )ɛ (55) (36,41) = λ α c α + Cɛ (c αt λ α c α )c α (c αt λ α ɛ)c α (ɛ T λ α c α )c α (c αt λ α c α )ɛ (56) (38) = λ α c α + Cɛ λ α c α λ α (c αt ɛ)c α λ α (ɛ T c α )c α λ α ɛ (57) = Cɛ 2λ α (c αt ɛ)c α λ α ɛ. (58)

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ.

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ. For simplicity consider the change of the perturbation along the eigenvector c β. c β w < ε> ε c β T < ε>? c βt ε c α

Which weight vectors are stable? 1 η ɛ µ w := (51) c α + ɛ, (58) Cɛ 2λ α (c αt ɛ)c α λ α ɛ. For simplicity consider the change of the perturbation along the eigenvector c β. 1 η cβ T ɛ µ (58) c β T Cɛ 2λ α(c αt ɛ)(c β T c α ) λ α(c β T ɛ) (59) (36,41) = λ β (c β T ɛ) 2λ α(c αt ɛ)(c β T c α ) λ α(c β T ɛ) (60) (42,43) = λ β (c β T ɛ) 2λ α(c αt ɛ)δ βα λ α(c β T ɛ) (61) = λ β (c β T ɛ) 2λ α(c β T ɛ)δ βα λ α(c β T ɛ) (62) = β 2λ αδ βα λ α) (c β T ɛ) (λ { (63) = ( 2λ β )(c β T ɛ) if β = α (λ β λ α)(c β T ɛ) if β α (64)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α.

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69)

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69) { η( 2λ κ αβ := β ) if β = α (70) η(λ β λ α ) if β α

Which weight vectors are stable? 1 η cβt ɛ µ w (51) := c α + ɛ, { (64) ( 2λ β )(c βt ɛ) if β = α (λ β λ α )(c βt ɛ) if β α. With c βt ɛ µ = c βt ɛ µ (65) = c βt (ɛ n+1 ɛ n ) µ (66) = (c βt ɛ) n+1 (c βt ɛ) n µ (67) = (c βt ɛ) µ (68) s β := c βt ɛ (perturbation along c β ) (69) { η( 2λ κ αβ := β ) if β = α (70) η(λ β λ α ) if β α we get (64) (68, 69, 70) s β µ κ αβ s β. (71)

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α β = / α λ β < λα c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α κ αβ = η(λ β λ α ) < 0, β = / α λ β < λα κ < 0 αβ c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α κ αβ = η(λ β λ α ) < 0, The perturbation along c β decays. β = / α λ β < λα κ < 0 αβ c β -Richtung c β < s > β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α β = / α λβ = λ α c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α κ αβ = η(λ β λ α ) = 0, β = / α λβ = λα κ αβ = 0 c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 2: β α, λ β = λ α κ αβ = η(λ β λ α ) = 0, The perturbation along c β persists. c β β = / α λβ = λ κ αβ = 0 α < s > β c β -Richtung s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α β = / α λβ > λ α c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α κ αβ = η(λ β λ α ) > 0, β = / α λβ > λα κ > 0 αβ c β -Richtung c β s β ε c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 3: β α, λ β > λ α κ αβ = η(λ β λ α ) > 0, The perturbation along c β grows. c β β = / α λβ > λα c β -Richtung κ αβ> 0 < s β> ε s β c α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α c β β = α ε c α -Richtung c α s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α κ αβ = η( 2λ β ) < 0. c β β = α κ < 0 αβ ε c α -Richtung c α s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 4: β = α κ αβ = η( 2λ β ) < 0. The perturbation along c α always decays (if λ β > 0). c β β = α κ < 0 αβ ε c α s α c α -Richtung < > s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Cases 3 and 4 combined. The weight vectors turns away c α into the c β -direction. c β β = α λ > λ β α < s > c β Richtung < ε > β s β c α ε s α c α Richtung < > s α

Which weight vectors are stable? (51) w := c α + ɛ, { (71) (70) η( 2λ s β µ κ αβ s β with κ αβ := β ) if β = α η(λ β λ α ) if β α.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay,

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72)

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72) Only the eigenvector c 1 with largest eigenvalue λ 1 is a stable weight vector under Oja s rule.

Which weight vectors are stable? (51) w := c α + ɛ, { (71) s β µ κ αβ s β with (70) κ αβ := η( 2λ β ) if β = α η(λ β λ α ) if β α. Case 1: β α, λ β < λ α, The perturbation along c β decays. Case 2: β α, λ β = λ α, The perturbation along c β persists. Case 3: β α, λ β > λ α, The perturbation along c β grows. Case 4: β = α, The perturbation along c α always decays. The weight vector w = c α is stable only if perturbations in all directions decay, i.e. w = c α stable λ β < λ α β α. (72) Only the eigenvector c 1 with largest eigenvalue λ 1 is a stable weight vector under Oja s rule. What happens if the two largest eigenvalues are equal?

y Summary (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) 1. What is the mean dynamics of the weight vectors? 1 η w µ (32) = Cw ( w T Cw ) w with C (31) := xx T µ 2. What are the stationary weight vectors of the dynamics? 0! = w µ w = c α 3. Which weight vectors are stable? with Cc α (36) = λ α c α (eigenvectors of C) 1 (38) = c α 2 (with norm 1) w = c α stable (72) λ β < λ α β α

Reminder: Principal Components c 2 c 1 Principal components are eigenvectors of the covariance matrix and point in the direction of maximal variance within the space orthogonal to the earlier principal components.

Learning Several Principal Components x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N (Rubner & Tavan, 1989, Europhys. Letters 10:693 8; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

Learning Several Principal Components x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N x 1 x 2 x 3 x N y 1 y y 2 3 j 1 y j = wj T x + v jk y k k=1 w j = η(y j x yj 2 w j ) v jk = ɛy j y k Asymmetric inhibitory lateral connections decorrelate later from earlier output units. The units learn the principal components in order of decreasing eigenvalue. (Rubner & Tavan, 1989, Europhys. Letters 10:693 8; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

Learning a Principal Subspace x 1 x 2 x 3 y y (4) = w T x (linear unit) w (20) = η(yx y 2 w) (Oja s rule) x N x 1 x 2 x 3 x N y 1 y y 2 3 y j = w T j x + N v jk y k k=1 k j w j = η(y j x yj 2 w j ) v jk = ɛy j y k Symmetric inhibitory lateral connections mutually decorrelate output units. The units learn the principal subspace but not particular principal components. (Földiák, 1989, Proc. IJCNN 89 pp. 401 405; reviewed in Becker & Plumbley, 1996, J. Appl. Intelligence 6:185 205)

The Principal Components of Natural Images 15 natural images of size 256 256 pixels. 20,000 random samples of size 64 64 pixels. For each pixel the mean gray value over the 20,000 samples was removed. The samples were windowed with a Gaussian with std. dev. 10 pixels. Sanger s rule was applied to the samples. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)

The Principal Components of Natural Images The first principal components resemble simple-cell receptive fields in the primary visual cortex, the later ones do not. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)

The Principal Components of Natural Images The principal components look different if samples are taken from text. (Hancock, Baddeley, & Smith, 1992, Network 3(1):61 70)