Molecular Structure: matching and kinematics

Σχετικά έγγραφα
Molecular Structure: matching and kinematics

Reminders: linear functions

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Approximation of distance between locations on earth given by latitude and longitude

Numerical Analysis FMN011

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

2 Composition. Invertible Mappings

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Other Test Constructions: Likelihood Ratio & Bayes Tests

The Simply Typed Lambda Calculus

9.09. # 1. Area inside the oval limaçon r = cos θ. To graph, start with θ = 0 so r = 6. Compute dr

Section 8.3 Trigonometric Equations

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

EE512: Error Control Coding

Εισαγωγή στις πρωτεΐνες Δομή πρωτεϊνών Ταξινόμηση βάσει δομής Βάσεις με δομές πρωτεϊνών Ευθυγράμμιση δομών Πρόβλεψη 2D δομής Πρόβλεψη 3D δομής

Inverse trigonometric functions & General Solution of Trigonometric Equations

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Statistical Inference I Locally most powerful tests

Homework 3 Solutions

derivation of the Laplacian from rectangular to spherical coordinates

ST5224: Advanced Statistical Theory II

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

( ) 2 and compare to M.

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Section 7.6 Double and Half Angle Formulas

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Parametrized Surfaces

Fractional Colorings and Zykov Products of graphs

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Solutions to Exercise Sheet 5

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

On a four-dimensional hyperbolic manifold with finite volume

Example Sheet 3 Solutions

Areas and Lengths in Polar Coordinates

Math 6 SL Probability Distributions Practice Test Mark Scheme

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Bounding Nonsplitting Enumeration Degrees

Uniform Convergence of Fourier Series Michael Taylor

Every set of first-order formulas is equivalent to an independent set

Matrices and Determinants

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Section 9.2 Polar Equations and Graphs

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

New bounds for spherical two-distance sets and equiangular lines

Second Order Partial Differential Equations

Lecture 15 - Root System Axiomatics

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Areas and Lengths in Polar Coordinates

Lecture 2. Soundness and completeness of propositional logic

Space-Time Symmetries

TMA4115 Matematikk 3

Abstract Storage Devices

Homework 8 Model Solution Section

The ε-pseudospectrum of a Matrix

5. Choice under Uncertainty

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

6.3 Forecasting ARMA processes

Spherical Coordinates

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

4.6 Autoregressive Moving Average Model ARMA(1,1)

A Note on Intuitionistic Fuzzy. Equivalence Relation

Solution to Review Problems for Midterm III

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Integrals in cylindrical, spherical coordinates (Sect. 15.7)

Finite Field Problems: Solutions

C.S. 430 Assignment 6, Sample Solutions

1. Ηλεκτρικό μαύρο κουτί: Αισθητήρας μετατόπισης με βάση τη χωρητικότητα

Written Examination. Antennas and Propagation (AA ) April 26, 2017.

ECE Spring Prof. David R. Jackson ECE Dept. Notes 2

Problem Set 3: Solutions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Rectangular Polar Parametric

Congruence Classes of Invertible Matrices of Order 3 over F 2

The challenges of non-stable predicates

Concrete Mathematics Exercises from 30 September 2016

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

5.4 The Poisson Distribution.

Quadratic Expressions

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Μηχανική Μάθηση Hypothesis Testing

Trigonometric Formula Sheet

CORDIC Background (2A)

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

CRASH COURSE IN PRECALCULUS

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

Capacitors - Capacitance, Charge and Potential Difference

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 8η: Producer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Lecture 13 - Root Space Decomposition II

Second Order RLC Filters

Solutions to the Schrodinger equation atomic orbitals. Ψ 1 s Ψ 2 s Ψ 2 px Ψ 2 py Ψ 2 pz

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

Transcript:

Molecular Structure: matching and kinematics Ioannis Z. Emiris Dept. of Informatics & Telecoms, University of Athens Algs in Struc.BioInfo 17

Outline 03. Structure types, aminoacids, Ramachandran plot 12. Structure comparison 21. Databases, and prediction 27. Kinematics and Rigid transforms 30. Motion planning: Configuration space 39. Appendix: Ramachandran, Structure matching, geometric hashing Reading: Wikipedia for RMSD: Root-mean-square deviation. Choset, Kavraki et al. Principles of Robot Motion, Chap. 3, and E.

Structure types

Structure types Primary: ATCCGTG, FQRRTVQILQT Secondary: α-helix, β-sheet Super-secondary: α-hairpin, β-hairpin, β α β Tertiary: (3.9Å, 2.2Å, -45.1Å),.... Overall fold. E.g. N, H α, C, C β form regular tetrahedron centered at C α. Mirror symmetries isomers (proteins w/one isomer always). Quaternary: several monomers (domains) van der Waals.

Primary/tertiary structure of the 20 aminoacids. Backbone skeleton is: top-c, center (C α ), N-at-left. min: Glycine, special: Proline.

α-helix DNA: usually B-DNA in cells, radius 10Å; Z-, A-DNA less frequent, less standard helix. Proteins: Usually right-handed spiral (αr); Sidechains usually lie inside the helix Glycine may form left-handed helix αl. Each C α advances by: 100 o, 1.5Å. i.e. 3.6 residues per turn of the helix. Rigidity by H-bonds: (C=O) i (HN) i+4, and inward hydrophobic, outward hydrophilic faces.

β-sheet Parallel xor antiparallel (twisted). Composed of 2 (almost) coplanar strands. φ, ψ angles differ by π. Each C α advances by: 3.5Å Rigidity due to H-bonds: C=O HN, of neighboring strands

Antiparallel vs Parallel

Protein tertiary structure N, H α, C, C β form Tetrahedron, center C α. Rotation around single bonds N-C α (φ) and C α -C (ψ) one exception: proline (only ψ). angle ω around peptide bond has 2 states: trans: ω 180 o is usual cis: ω 0 o is rare, mostly at proline

Sasisekharan-Ramakrishnan-Ramachandran diagram Describes allowed mainchain conformations. Horizontal φ, vertical ψ, typically ω = 180 o. parallel β P, twisted β T ; right-handed α, left-handed L, 3 10, Π helices. Exception: Gly (no limitation), Pro (side chain back to backbone).

Ramachandran diagram: example structure types: α, β, Gly for protein 2ACY [Lesk].

Structure comparison

Measure difference of matched sets Hypotheses: pointsets of equal cardinality, given correspondance (match) Definition. (coordinate) Root Mean Square Deviation (c-rmsd) RMSD = 1 n n i=1 x i y i 2, where x i, y i R 3 are (C α ) atom coordinates in SAME coordinate frame. Lemma. c-rmsd satisfies the triangular inequality. Hence it defines a distance metric.

Optimal Alignment of matched sets Problem. Find translation and rotation minimizing c-rmsd. 1. Translate to common origin by subtracting from x i s centroid x c = 1 n n i=1 x i, x i R 3, and subtracting y c from all y i s; overall = O(n). 2. Rotate to optimal alignment by 3 3 rotation matrix Q : Q T Q = I. Also should have det Q = 1. Deterministic linear algebra (SVD) algorithm [Kabsch]: O(n). Lemma: optimal translation can be decoupled from rotation optimization. Proof: for any Q, optimal translation brings center of mass to origin.

Matrix algebra Let X = [x 1,..., x n ] T, Y = [y 1,..., y n ] T R n 3, then RMSD(X, Y ) = 1 n X Y F, where M 2 F = i,j M 2 ij = tr(m T M), is the Frobenius norm, tr(a) = i A ii is the trace of matrix A = [A ij ]. Recall rotated vector is v T Q or Qv, for column vector v R 3. Assume common centroid = 0, X, Y R n 3 : RMSD(X, Y ) = min Q Y XQ F, Q T Q = I, Q = 1. Proposition. Optimizing rotation Q R 3 3 reduces to max Q tr(q T X T Y ).

Singular Value Decomposition Recall SVD: X T Y = UΣV T, U T U = V T V = I, Σ = σ 1 0 0 0 σ 2 0 0 0 σ 3 where : σ 1 σ 2 σ 3, U, V, Σ are 3 3 like X T Y, and singular values σ i = e i 0, e i are eigenvalues(x T Y ). We wish to find Q that maximizes: tr(q T X T Y ) = tr(q T UΣ V T ) = tr(v T Q T UΣ) tr(σ). 2nd equality by Lem. T; inequality since M = V T Q T U is orthonormal M ij 1 tr(mσ) = i M ii σ i i σ i. Thm. Maximum occurs at M = I Q = UV T. If det Q = 1 then Q reflection, hence negate Q 33 to get rotation. Overall complexity = O(n).

Algorithm Input: pointsets X, Y R n 3 of n corresponding points. Output: minimum RMSD of translated and rotated sets. Algorithm. x c n i=1 x i /n, y c n i=1 y i /n. X {x x c : x X}, Y {y y c : y Y }. SVD: X T Y = UΣV T. Optional: Check σ 3 > 0, where Σ = diag[σ 1, σ 2, σ 3 ]. Q U V T. If det Q < 0 then Q [U 1, U 2, U 3 ] V T. // U i : ith column Return X Q Y F / n // or ni=1 Qx i y i 2 /n

distance-rmsd Assume that r distances d i, i = 1,..., r are known between point-pairs in X and between the corresponding pairs in Y, denoted d i, i = 1,..., r. Defn. For r matched distances, there is a distance-rmsd 2 = 1 r r i=1 Drawback: Computed in O(r) = O(n 2 ). (d i d i )2, r ( n 2 ). Lem. d-rmsd invariant under rigid transforms: translate, rotate, reflect. d-rmsd is a metric in (Euclidean) R r space; but then one point represents a conformation and its mirror image. Please check [Guibas?]: c-rmsd / n d-rmsd 2 c-rmsd.

Vector of distances Equivalent formulation: Let v(x) = (d 1,..., d r ), v(y ) = (d 1,..., d r) R r be the vectors of distances in X, Y respectively. Their Euclidean distance is v(x) v(y ) 2 = r d-rmsd(x, Y ). Subset of distances: Use r ( ) n 2 distances. Must correspond to the same pairs of points in all conformations. May choose r uniformly selected pairs among ( ) n 2. May choose r smallest or largest distances, in one conformation. Alternative idea: distances from few landmark atoms.

Databases, and prediction

Databases Protein Data Bank (PDB) (www.rcsb.org) Structure information and retrieval File starts with protein name, author, maybe secondary structure Omits H-atoms Example: Hemoglobin, residue of Argynine: ATOM N ARG 16.467-2.155-11.004 ATOM CA ARG 16.174-2.970-9.786 ATOM C ARG 14.696-3.056-9.412 ATOM O ARG 14.307-3.945-8.624 ATOM CB ARG 16.892-2.495-8.550 Protein fold classification into hierarchies: SCOP (Structural Classification of Proteins), cf next slide [Murzin et al 95, Andreeva et al 04] CATH (domains) (Class, architecture, topology, homology) [Orengo et al 97, Pearl et al 05] FSSP (DALI offers structural alignment) [Holm,Sander 96] CE (structural alignment)

SCOP Hierarchy Lowest level: individual protein domains (from PDB) families of homologues: similar structure, sequence, (function) imply common evolutionary origin superfamilies: families of similar structure and function, weak evolutionary relationship folds: superfamilies with common folding topology Highest level: classes: α, β, α + β, α/β (α and β) and small proteins Homology of structures expresses common ancestry: either evolutionary: evolved from structure in common ancestor (wings of bats and arms of primates), or developmental: from same tissue in embryonal development (ovaries of female and testicles of male humans).

SCOP example 1 Root SCOP 2 Class α/β, mainly parallel β-sheets (β α β units) 3 Fold Flavodoxin-like: 3 layers, α/β/α; parallel β-sheet of 5 strands, order 21345 4 Superfamily Flavoproteins 5 Family Flavodoxin-related binds FMN 6 Protein Flavodoxin 7 Species Clostridium beijerinckii [Lesk,p.224]

SCOP size In July 2001, SCOP contained 13,220 PDB entries, in 31,474 domains: Class families superfamilies folds All-α proteins 337 224 138 All-β proteins 276 171 93 α/β proteins 374 167 97 α + β proteins 391 263 184 Multi-domain 35 28 28 membrane, cell-surface 28 17 11 Small proteins 116 77 54 Total 1557 947 605

Rigid-body kinematics: Motivation

Molecular kinematics Given a rigid body with specific degrees of freedom (e.g. dihedral angles about covalent bonds), its kinematics describe the allowed motions under certain geometric constraints (distances, angles etc) Modeling of constraints as an algebraic / optimization problem. Applications: structure determination of small (sub)molecules, dimension-reduction during docking, pharmacophore matching. There s many small molecules: most (about 15%) with 4 dof, < 10% with > 10 dof, out of 730,000 w/rotational dof [Irwin-Shoichet 04]

Rigid transforms

Rigid (Euclidean) transformations Preserve distances and angles. Translation d R 3, x x + d. Rotation R SO(3) : R 1 = R T, det R = 1, x Rx. R 1 : rotation by negative angle. R 1 by θ 1, R 2 by θ 2 R 1 R 2 by θ 1 + θ 2. Reflection R : det R = 1 (reflection in R 2 takes body out of the plane) Scaling and Shearing are NOT rigid.

2D transforms Rotation, scaling, shearing: [ ] [ cos θ sin θ sx 0, sin θ cos θ 0 s y ] (typically s x, s y > 0), [ 1 a 0 1 ]. T = cos θ sin θ 0 sin θ cos θ d 0 0 1 : homogeneous transform: translation by d, rotation (by θ) : R SO(2), R 1 SO(2), R 1 = R T, det R = 1. cos θ sin θ 0 sin θ cos θ d 0 0 1 x y 1 i+1 = x y 1 i

Motion planning

Εισαγωγή Ερωτήματα σχετικά με τον σχεδιασμό κίνησης (motion planning) ενός ρομποτικού μηχανισμού: Πόση πληροφορία χρειάζεται για να προσδιοριστεί η θέση κάθε σημείου του ρομπότ; Πώς θα αναπαρασταθεί η παραπάνω πληροφορία; Ποιες είναι οι μαθηματικές ιδιότητες της αναπαράστασης της πληροφορίας; Πώς θα λάβουμε υπ όψιν τα εμπόδια στον σχεδιασμό των κινήσεων; [Choset, Kavraki et al. Principles of Robot Motion, Chapter 3]

Βασικές έννοιες Διαμόρφωση (robot configuration, molecule conformation): πλήρης προσδιορισμός της θέσης (π.χ. 3 συντεταγμένες) κάθε σημείου του ρομπότ. Χώρος διαμορφώσεων (Configuration space, C-space): Ο χώρος όλων των πιθανών διαμορφώσεων του ρομπότ, όπου καθε διαμόρφωση αντιστοιχεί σε ένα σημείο του χώρου. Βαθμοί ελευθερίας (Degrees of freedom): Ο αριθμός των παραμέτρων που απαιτούνται για να προσδιοριστεί μία διαμόρφωση. Ισοδύναμα, η διάσταση του χώρου διαμορφώσεων. Χώρος εργασίας (Workspace): Ο φυσικός χώρος που είναι προσβάσιμος από το ρομπότ, τυπικά 3Δ. Προσοχή: Χώρος εργασίας Χώρος διαμορφώσεων.

Παράδειγμα 1: Ρομπότ-δίσκος Ρομπότ-δίσκος, δεδομένης ακτίνας r, το οποίο κινείται στο δισδιάστατο επίπεδο R 2. Διαμόρφωση: q = (x, y) αρκεί να προσδιοριστεί το κέντρο του ρομπότ, άρα C-space R 2. Για κάθε διαμόρφωση μπορούμε να υπολογίσουμε τα σημεία που καταλαμβάνει το ρομπότ ως εξής: R(x, y) = {(x, y ) R 2 (x x ) 2 + (y y ) 2 r 2 }, r = ακτίνα του ρομπότ. Μπορούμε να ορίσουμε τον χώρο διαμορφώσεων και τον χώρο εργασίας. Είναι και οι δύο υποσύνολα του R 2, αλλά είναι διαφορετικοί!

Παράδειγμα 2: Βραχίονας με δύο αρθρώσεις

Παράδειγμα 2: Βραχίονας με δύο αρθρώσεις Διαμόρφωση: η θέση του χεριού (elbow up / down δηλ. θ 2 ) δεν αρκεί: χρειάζονται οι γωνίες και των 2 αρθρώσεων: q = (θ 1, θ 2 ). Κάθε άρθρωση μπορεί να περιστραφεί σε ένα μοναδιαίο κύκλο S 1 χώρος διαμορφώσεων Q = S 1 S 1 = T 2 δηλ. δισδιάστατος τόρος. χώρος εργασίας = ένας δίσκος R 2 (εικόνα δεξιά).

Εμπόδια Εμποδια χώρου διαμορφώσεων (C-space obstacles): Διαμορφώσεις q όπου το ρομπότ R(q) συγκρούεται με εμπόδιο W i : O i = {q Q R(q) W i }. Ελευθερος χώρος διαμορφώσεων (free C-space): Q free = Q \ ( i O i ) Ελεύθερο μονοπάτι (free path): Μονοπάτι χωρίς συγκρούσεις με εμπόδια που δεν περιλαμβάνει ούτε τα ακραία σημεία του Q free. Δίνεται από παραμετροποίηση: c : [0, 1] Q free. Ημι-ελεύθερο μονοπάτι (semifree path): Οπως το ελεύθερο, αλλά μπορεί να περιλαβει ακραία σημεία (όριο) του Q free : c : [0, 1] Closure(Q free ).

Παράδειγμα 1 (με εμπόδια) (1) Κυκλικό ρομπότ και πολυγωνικό εμπόδιο στο R 2. (2) το ρομπότ διατρέχει το εμπόδιο του χώρου εργασίας (workspace obstacle). Ελέγχουμε συγκεκριμένα σημεία. (3) Η τροχιά του κέντρου ορίζει το εμπόδιο στον χώρο διαμορφώσεων (C-space obstacle), όπου το ρομπότ = σημείο. Επαυξημένο πολύγωνο = άθροισμα Minkowski του αρχικού + δίσκο

Παράδειγμα 2 (με εμπόδια) A A Για τα εμπόδια στον χώρο διαμορφώσεων, θεωρούμε σύνολο διαμορφώσεων και για καθεμία υπολογίζουμε αν προκαλεί σύγκρουση. Ο βραχίονας έχει 2 αρθρώσεις: θ 1 = 0 στον άξονα x, θ 2 = 0 στον x, αμφότερες CCW. One point is fixed (center of left fig.). [Choset,Kavraki et al. Sec.3.2.2]

Appendix

Ramachandran diagram (stats) 20-residue average except Gly / Pro

Structure matching

Rigid Matching Finding best transform ie. yielding max/bio-favorable superposition. Dependent on sequence-order: Matching set [Taylor-Orengo 89] (Dynamic Programming SSAP). fragments [Vriend-Sander] follow sequence order. FSSP-DALI [Holm,Sander 93], CE [Bourne,Shindyalov 98] Independent of Sequence (unlabeled points, different cardinalities) Geometric hashing (from vision): finds translation, rotation, scaling maxclique in SSE graph (by 2ary elements) [Mitchel et al] [Koch et al] Sequence independence: - 3d task vs essentially linear task. - Simultaneous match of sequence / structure is better + Finds non-sequential motifs eg. binding sites + works with partial / disconnected input

Geometric Hashing: 2D preprocess Preprocess each pointset (model) in database: pair (points #4, #1 below), define a reference frame: Compute coordinates (x, y) of all points in this frame, store [model, frame] in entry Hash(x, y). Storing 3 hash entries (2 shown by arrows) in 2D

Geometric Hashing: 2D query Online processing of query pointset (image): I. Pick reference frame (defined by 2 points): compute coordinates of all query points in this frame. II. Hash query points: for every data point in its hash-entry, cast a vote for the corresponding [model, frame] III [model, transform] with high scores induce potential match: optimize transform by least-squares (or RMSD on matched points) Hashed points vote for each [model,frame] pair in their hash entries (2 arrows shown)

Geometric Hashing: Complexity Parameters. M = #structures in database (models), n = #points per structure/model, c = 1 + #points to define a frame: c = 3 in 2D, c = 4 in 3D. Time complexity. preprocess = O(Mn c ), online query = O(Hn c ), where H = #complexity of checking one hashtable entry. H = O(1) typically when Space = O(Mn c ), good hashing; or can be H = O(Mn c ) for small/unlucky tables. [eclass/eggrafa/apallaktikh/wolfson-rigoutsos 99]

Geometric hashing: generalization Idea: Given two objects each with n unlabeled points: Each Pair of almost-congruent triangles defines 3D rigid transform (congruent/similar: invariant under translation, rotation, scaling) For each candidate transform, count superposed points. For best candidates, find RMSD on matched pairs, keep the best. Complexity O(n 7 ) (if we exploit backbone geometry: n 3 ) [Wolfson slides] Against database: 0. point (residue), define local neighborhood. 1. Geometric Hashing gives seed matches. 2. Cluster seed matches by merging matched points 3. Compare RMSDs of clusters; extend better clusters until solution Extra: store features into [model, frame, features]

Flexible Alignment Motivation. Mutations/docking imply conformational change Hinge and shear motion of domains [Lesk] Existing work 3D curve matching [Schwartz,Sharir 87], using splines [Wolfson et al 91] Dock [Leach,Kutz]. FlexX (dock), FlexS (structures) use anchors [Lengauer,Lemmen,Klebe 98] small-molecule database search [Rigoutsos,Platt,Califano 96] Pose clustering [Verbitsky,Wolfson,Nussinov 99]. Known hinges, hashing [Fligelman,Nussinov,Wolfson 00] FlexProt [Shatsky,Nussinov,Wolfson 02]