Neural'Networks' Robot Image Credit: Viktoriya Sukhanova 123RF.com

Σχετικά έγγραφα
Partial Differential Equations in Biology The boundary element method. March 26, 2013

CORDIC Background (4A)

MathCity.org Merging man and maths

CORDIC Background (2A)

(C) 2010 Pearson Education, Inc. All rights reserved.

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

1. If log x 2 y 2 = a, then dy / dx = x 2 + y 2 1] xy 2] y / x. 3] x / y 4] none of these

Στο εστιατόριο «ToDokimasesPrinToBgaleisStonKosmo?» έξω από τους δακτυλίους του Κρόνου, οι παραγγελίες γίνονται ηλεκτρονικά.

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

Solution Series 9. i=1 x i and i=1 x i.

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Modbus basic setup notes for IO-Link AL1xxx Master Block

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

The Simply Typed Lambda Calculus

Fractional Colorings and Zykov Products of graphs

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 11/3/2006

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Math 6 SL Probability Distributions Practice Test Mark Scheme

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

Συστήματα Διαχείρισης Βάσεων Δεδομένων

ENGR 691/692 Section 66 (Fall 06): Machine Learning Assigned: August 30 Homework 1: Bayesian Decision Theory (solutions) Due: September 13

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Code Breaker. TEACHER s NOTES

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Computing Gradient. Hung-yi Lee 李宏毅

Inverse trigonometric functions & General Solution of Trigonometric Equations

Homomorphism of Intuitionistic Fuzzy Groups

Μηχανική Μάθηση Hypothesis Testing

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

The challenges of non-stable predicates

Ατομική Διπλωματική Εργασία ΑΠΟΚΑΤΑΣΤΑΣΗ ΜΕΡΙΚΩΣ ΕΠΙΚΑΛΥΜΜΕΝΩΝ ΕΙΚΟΝΩΝ ΠΡΟΣΩΠΟΥ ΜΕ ΔΙΚΤΥΑ HOPFIELD ΚΑΙ ΜΗΧΑΝΕΣ BOLTZMANN.

CYTA Cloud Server Set Up Instructions

Calculating the propagation delay of coaxial cable

UNIVERSITY OF CALIFORNIA. EECS 150 Fall ) You are implementing an 4:1 Multiplexer that has the following specifications:

Προσομοίωση BP με το Bizagi Modeler

Differential equations

Si + Al Mg Fe + Mn +Ni Ca rim Ca p.f.u

Πώς μπορεί κανείς να έχει έναν διερμηνέα κατά την επίσκεψή του στον Οικογενειακό του Γιατρό στο Ίσλινγκτον Getting an interpreter when you visit your

Areas and Lengths in Polar Coordinates

Differentiation exercise show differential equation

Numerical Analysis FMN011

The ε-pseudospectrum of a Matrix

Ο ΠΥΡΓΟΣ. Εμμανουήλ Φαϊτός. Published by Emmanuel Phaetos at Smashwords. Copyright 2013 Emmanuel Phaetos. Smashwords Edition

Tutorial on Multinomial Logistic Regression

CE 530 Molecular Simulation

Example Sheet 3 Solutions

derivation of the Laplacian from rectangular to spherical coordinates

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

C.S. 430 Assignment 6, Sample Solutions

Homework 3 Solutions

Durbin-Levinson recursive method

TMA4115 Matematikk 3

g-selberg integrals MV Conjecture An A 2 Selberg integral Summary Long Live the King Ole Warnaar Department of Mathematics Long Live the King

Δημιουργία Λογαριασμού Διαχείρισης Business Telephony Create a Management Account for Business Telephony

( ) 2 and compare to M.

How to register an account with the Hellenic Community of Sheffield.

Εγκατάσταση λογισμικού και αναβάθμιση συσκευής Device software installation and software upgrade

Areas and Lengths in Polar Coordinates

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Statistical Inference I Locally most powerful tests

Κάτοχοι εμπορικών σημάτων

Section 8.3 Trigonometric Equations

e-satisfaction: Η ψηφιακη πλατφόρμα μέτρησης της γνώμης & ικανοποίησης των online καταναλωτών

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Μπεϋζιανά & Νευρωνικά Δίκτυα

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Quick algorithm f or computing core attribute

Example of the Baum-Welch Algorithm

Automating Complex Workflows using Processing Modeler

Προσωπική Aνάπτυξη. Ενότητα 4: Συνεργασία. Juan Carlos Martínez Director of Projects Development Department

Meta-Learning and Universality

Ανδρέας Παπαζώης. Τμ. Διοίκησης Επιχειρήσεων

SAINT CATHERINE S GREEK SCHOOL ΕΛΛΗΝΙΚΟ ΣΧΟΛΕΙΟ ΑΓΙΑΣ ΑΙΚΑΤΕΡΙΝΗΣ

Test Data Management in Practice

PARTIAL NOTES for 6.1 Trigonometric Identities

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Paper Reference. Paper Reference(s) 1776/04 Edexcel GCSE Modern Greek Paper 4 Writing. Thursday 21 May 2009 Afternoon Time: 1 hour 15 minutes

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

6. MAXIMUM LIKELIHOOD ESTIMATION

Advanced Subsidiary Unit 1: Understanding and Written Response

Network Algorithms and Complexity Παραλληλοποίηση του αλγορίθμου του Prim. Αικατερίνη Κούκιου

ΕΠΙΧΕΙΡΗΣΙΑΚΗ ΑΛΛΗΛΟΓΡΑΦΙΑ ΚΑΙ ΕΠΙΚΟΙΝΩΝΙΑ ΣΤΗΝ ΑΓΓΛΙΚΗ ΓΛΩΣΣΑ

IIT JEE (2013) (Trigonomtery 1) Solutions

Instruction Execution Times

Transcript:

Neural'Networks' These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova 3RF.com '

Neural'Networks' Origins:'Algorithms'that'try'to'mimic'the'brain' 4s'and'5s:'Hebbian'learning'and'Perceptron' Perceptrons'book'in'99'and'the'XOR'problem' Very'widely'used'in'8s'and'early'9s;'popularity' diminished'in'late'9s.' Recent'resurgence:'State@of@the@art'technique'for' many'applicaons' Arficial'neural'networks'are'not'nearly'as'complex' or'intricate'as'the'actual'brain'structure' Based'on'slide'by'Andrew'Ng' 8'

Single'Node' bias'unit ' x x = x = 4 x x x x 3 3 5 = 4 3 3 5 3 X h (x) =g ( x) = +e T x Sigmoid'(logisc)'acvaon'funcon:' g(z) = +e z Based'on'slide'by'Andrew'Ng' 9'

Neural'Network' bias'units' x a () h (x) Layer'' (Input'Layer)' Layer'' (Hidden'Layer)' Layer'3' (Output'Layer)' Slide'by'Andrew'Ng' '

Neural'networks'Terminology' Output units Hidden units Layered feed-forward network Input units Neural'networks'are'made'up'of'nodes'or'units,' connected'by'links' Each'link'has'an'associated'weight'and'ac,va,on'level' Each'node'has'an'input'func,on'(typically'summing'over' weighted'inputs),'an'ac,va,on'func,on,'and'an'output' Based'on'slide'by'T.'Finin,'M.'desJardins,'L'Getoor,'R.'Par' '

Feed-Forward Process' Input'layer'units'are'set'by'external'data,'which' causes'their'output'links'to'be'ac,vated'at'the' specified'level' Working'forward'through'the'network,'the'input' func,on'of'each'unit'is'applied'to'compute'the'input' value' Usually'this'is'just'the'weighted'sum'of'the'acvaon'on' the'links'feeding'into'this'node' The'ac,va,on'func,on'transforms'this'input' funcon'into'a'final'value' Typically'this'is'a'nonlinear'funcon,'onen'a'sigmoid' funcon'corresponding'to'the' threshold 'of'that'node' Based'on'slide'by'T.'Finin,'M.'desJardins,'L'Getoor,'R.'Par' 3'

() () h (x) a i (j) = acvaon 'of'unit'i''in'layer'j Θ (j) = weight'matrix'stores'parameters' from'layer'j to'layer'j +' If'network'has's j 'units'in'layer'j and(s j+ units'in'layer'j+,' then'θ (j) has'dimension's j+ (s j +)'''''''''''''''''''''''''''''''.' Slide'by'Andrew'Ng' () R 3 4 () R 4 4'

Vectorizaon' a () = g z () = g () x + () x + () x + () 3 x 3 a () = g z () = g () x + () x + () x + () 3 x 3 a () = g z () 3 = g () 3 x + () 3 x + () 3 x + () 33 x 3 3 h (x) =g () a() + () a() + () a() + () 3 a() 3 = g z Feed@Forward'Steps:' z () = () x Based'on'slide'by'Andrew'Ng' () () h (x) a () = g(z () ) Add a () = z = () a () h (x) =a = g(z ) 5'

Other'Network'Architectures' s'='[3,'3,',']' h (x) Layer' ' Layer' ' Layer'3 ' Layer'4 ' L'denotes'the'number'of'layers' ' s N +L (('''''contains'the'numbers'of'nodes'at'each'layer Not'counng'bias'units' Typically,'s = d''(#'input'features)'and's L- =K''(#'classes)' '

Mulple'Output'Units:''One@vs@Rest' ' Pedestrian' Car' Motorcycle' Truck' h (x) R K when'pedestrian''''''''''''when'car''''''''''''''when'motorcycle'''''''''''''when'truck' h (x) 4 3 5 h (x) 4 3 5 h (x) 4 3 5 h (x) 4 3 5 We'want:' Slide'by'Andrew'Ng'

Mulple'Output'Units:''One@vs@Rest' Given'{(x,y ), (x,y ),..., (x n,y n )} Must'convert'labels'to'@of@K'representaon' e.g.,''''''''''''''''''''when'motorcycle,''''''''''''''''''''''when'car,'etc.'' 8' h (x) R K when'pedestrian''''''''''''when'car''''''''''''''when'motorcycle'''''''''''''when'truck' h (x) 4 3 5 h (x) 4 3 5 h (x) 4 3 5 h (x) 4 3 5 We'want:' y i = 4 3 5 y i = 4 3 5 Based'on'slide'by'Andrew'Ng'

Neural'Network'Classificaon' Given:' {(x,y ), (x,y ),..., (x n,y n )} ' s N ((''contains'#'nodes'at'each'layer +L s = d''(#'features)' Binary'classificaon' ''y'=''or'' ' '''output'unit'(s L- = )' ' ' Mul@class'classificaon'(K'classes)' y R K e.g.''''''''''',''''''''''''',''''''''''''''''',' pedestrian'''car'''''motorcycle'''truck' ''K'output'units'(s L- = K)' ' Slide'by'Andrew'Ng' 9'

Understanding'Representaons' '

Represenng'Boolean'Funcons' Simple'example:'AND' Logisc'/'Sigmoid'Funcon' g(z) @3' +' +' h (x) h (x) = g(-3 + x + x ) x ' x ' ' ' h Θ (x)' ' ' g(@3)' '' ' ' g(@)' '' ' ' g(@)' '' ' ' g()' '' ' Based'on'slide'and'example'by'Andrew'Ng' '

Represenng'Boolean'Funcons' @3' AND' @' OR' +' +' h (x) +' +' h (x) NOT' (NOT'x )'AND'(NOT'x )' +' @' h (x) +' @' @' h (x) '

Combining'Representaons'to'Create' Non@Linear'Funcons' (NOT'x )'AND'(NOT'x )' @3' AND' +' @' OR' +' +' h (x) @' @' h (x) +' +' h (x) NOT'XOR' II III I IV @3' +' @' +' in'i +' in'iii +' @' +' I or III h (x) @' Based'on'example'by'Andrew'Ng' 3'

Layering'Representaons' x... x ' x... x 4' x 4... x '... ' ''pixel'images' d'='4''''''classes' x 38... x 4' Each'image'is' unrolled 'into'a'vector'x'of'pixel'intensies' 4'

Layering'Representaons' x x x 3 ' ' x 4 x 5 x d Input'Layer' Hidden'Layer' Output'Layer' 9 ' 5'

Neural'Network'Learning' 8'

Perceptron'Learning'Rule' + (y h(x))x Intuive'rule:' If'output'is'correct,'don t'change'the'weights' If'output'is'low'(h(x)'=','y ='),'increment' weights'for'all'the'inputs'which'are'' If'output'is'high'(h(x)'=','y ='),'decrement' weights'for'all'inputs'which'are'' ' Perceptron'Convergence'Theorem:'' If'there'is'a'set'of'weights'that'is'consistent'with'the'training' data'(i.e.,'the'data'is'linearly'separable),'the'perceptron' learning'algorithm'will'converge'[minksy'&'papert,'99]' 3'

Batch'Perceptron' Given training data Let [,,...,] (x (i),y (i) ) n i= Repeat: Let [,,...,] for i =...n,do if y (i) x (i) apple + y (i) x (i) Until k + k < // prediction for i th instance is incorrect /n // compute average update Simplest'case:''α'=''and'don t'normalize,'yields'the'fixed' increment'perceptron' Each'increment'of'outer'loop'is'called'an'epoch' Based'on'slide'by'Alan'Fern' 3'

Learning'in'NN:'Backpropagaon' Similar'to'the'perceptron'learning'algorithm,'we'cycle' through'our'examples' If'the'output'of'the'network'is'correct,'no'changes'are'made' If'there'is'an'error,'weights'are'adjusted'to'reduce'the'error' We'are'just'performing''(stochasc)'gradient'descent!' Based'on'slide'by'T.'Finin,'M.'desJardins,'L'Getoor,'R.'Par' 3'

Cost'Funcon' Logisc'Regression:' J( ) = n nx dx [y i log h (x i )+( y i ) log ( h (x i ))] + n i= j= j Neural'Network:' h R K (h (x)) i = i th output " J( ) = n KX X X # y ik log (h (x i )) k + ( y ik ) log (h (x i )) k n i= L s X X l Xs l + n l= k= i= j= (l) ji k th 'class: 'true,'predicted' not'k th 'class:'true,'predicted' Based'on'slide'by'Andrew'Ng' 33'

J( ) = Opmizing'the'Neural'Network' n " n X i= LX + n l= Solve'via:'' KX y ik log(h (x i )) k +( k= s X l i= Xs l j= (l) ji # y ik ) log (h (x i )) k Unlike before, J(Θ)'is'not' convex,'so'gd'on'a'neural'net' yields'a'local'opmum' @ @ (l) ij J( ) =a (l) j (l+) i (ignoring'λ;'if'λ = )' Based'on'slide'by'Andrew'Ng' Forward' Propagaon' Backpropagaon' 34'

Forward'Propagaon' Given'one'labeled'training'instance'(x, y):' ' Forward'Propagaon' a () '= x' z () = Θ () a () a () '= g(z () ) [add'a () ]' a () a () a a (4) z = Θ () a () a '= g(z ) [add'a ]' z (4) = Θ a a (4) '= h Θ (x) = g(z (4) )' Based'on'slide'by'Andrew'Ng' 35'

Backpropagaon'Intuion' () (4) () δ j (l) =' error 'of'node'j'in'layer'l Formally,' (l) j = @ cost(x @z (l) i ) j where cost(x i )=y i log h (x i )+( y i ) log( h (x i )) Based'on'slide'by'Andrew'Ng' 3'

Backpropagaon'Intuion' () (4) () δ (4) = a (4) y δ j (l) =' error 'of'node'j'in'layer'l Formally,' Based'on'slide'by'Andrew'Ng' (l) j = @ cost(x @z (l) i ) j where cost(x i )=y i log h (x i )+( y i ) log( h (x i )) 38'

Backpropagaon'Intuion' () (4) () δ = Θ δ (4) '' δ j (l) =' error 'of'node'j'in'layer'l Formally,' Based'on'slide'by'Andrew'Ng' (l) j = @ cost(x @z (l) i ) j where cost(x i )=y i log h (x i )+( y i ) log( h (x i )) 39'

Backpropagaon'Intuion' () (4) δ j (l) =' error 'of'node'j'in'layer'l Formally,' Based'on'slide'by'Andrew'Ng' () (l) j = @ cost(x @z (l) i ) j δ = Θ δ (4) '' δ = Θ δ (4) '' where cost(x i )=y i log h (x i )+( y i ) log( h (x i )) ' 4'

Backpropagaon'Intuion' () () (4) () () δ () = Θ () δ '+'Θ () δ ' δ j (l) =' error 'of'node'j'in'layer'l Formally,' Based'on'slide'by'Andrew'Ng' (l) j = @ cost(x @z (l) i ) j where cost(x i )=y i log h (x i )+( y i ) log( h (x i )) 4'

Backpropagaon:'Gradient'Computaon' Let'δ j (l) =' error 'of'node'j'in'layer'l (#layers'l'='4) Backpropagaon'' Element@wise' product'.*' δ (4) = a (4) y δ = (Θ ) T δ (4).* g (z ) δ () = (Θ () ) T δ.* g (z () ) (No'δ () ) δ () δ g (z ) = a.* ( a )' g (z () ) = a ().* ( a () )' δ (4) ' ' Based'on'slide'by'Andrew'Ng' 4'

Backpropagaon' (l) ij Set = 8l, i, j For each training instance (x i,y i ): Set a () = x i Compute {a (),...,a (L) } via forward propagation Compute (L) = a (L) y i Compute errors { (L ),..., () } (l) Compute gradients ij = (l) ij + a(l) (l) (l+) j ( Compute avg regularized gradient D (l) ij D (l) is'the'matrix'of'paral'derivaves'of'j(θ)''' (Used'to'accumulate'gradient)' i ( (l) = n ij + (l) (l) n ij (l) (l) (l) ij if j = otherwise Based'on'slide'by'Andrew'Ng' 44'

Training'a'Neural'Network'via'Gradient' Descent'with'Backprop' Given: training set {(x,y ),...,(x n,y n )} Initialize all (l) randomly (NOT to!) Loop // each iteration is called an epoch (l) ij Set = 8l, i, j For each training instance (x i,y i ): Set a () = x i Compute {a (),...,a (L) } via forward propagation Compute (L) = a (L) y i Compute errors { (L ),..., () } Compute gradients (l) ij = (l) ij + a(l) j Compute avg regularized gradient D (l) ij (l+) i = ( n Update weights via gradient step (l) ij = (l) ij Until weights converge or max #epochs is reached (Used'to'accumulate'gradient)' n (l) ij + (l) (l) ij D (l) ij ij if j = otherwise Backpropagaon' Based'on'slide'by'Andrew'Ng' 45'

Inializaon' Several'Praccal'Tricks' Problem'is'highly'non@convex,'and'heuriscs'exist'to' start'training'(at'the'least,'randomize'inial'weights)' Opmizaon'tricks' Momentum@based'methods' Decaying'step'size' Dropout'to'avoid'co@adaptaon'/'overfiyng' Minibatch' Use'more'than'a'single'point'to'esmate'gradient' ' 4'

Neural'Networks'vs'Deep'Learning?' DNN'are'big'neural'networks' Depth:'onen'~5'layers'(but'some'have'+)' Typically'not'fully'connected!' Width:'hidden'nodes'per'layer'in'the'thousands' Parameters:'millions'to'billions' Algorithms'/'Compung' New'algorithms'(pre@training,'layer@wise'training,' dropout,'etc.)' Heavy'compung'requirements'(GPUs'are'essenal)' 4'