Tilapia Genome Status

Σχετικά έγγραφα
Right Rear Door. Let's now finish the door hinge saga with the right rear door

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

derivation of the Laplacian from rectangular to spherical coordinates

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Homework 3 Solutions

Modern Greek Extension

Code Breaker. TEACHER s NOTES

TMA4115 Matematikk 3

EE512: Error Control Coding

department listing department name αχχουντσ ϕανε βαλικτ δδσϕηασδδη σδηφγ ασκϕηλκ τεχηνιχαλ αλαν ϕουν διξ τεχηνιχαλ ϕοην µαριανι

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

[1] P Q. Fig. 3.1

ΠΑΡΑΜΕΤΡΟΙ ΕΠΗΡΕΑΣΜΟΥ ΤΗΣ ΑΝΑΓΝΩΣΗΣ- ΑΠΟΚΩΔΙΚΟΠΟΙΗΣΗΣ ΤΗΣ BRAILLE ΑΠΟ ΑΤΟΜΑ ΜΕ ΤΥΦΛΩΣΗ

Section 8.3 Trigonometric Equations

5.4 The Poisson Distribution.

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

Block Ciphers Modes. Ramki Thurimella

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Congruence Classes of Invertible Matrices of Order 3 over F 2

2 Composition. Invertible Mappings

the total number of electrons passing through the lamp.

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΑΛΕΞΑΝΔΡΟΣ ΠΑΛΛΗΣ SCHOOLTIME E-BOOKS

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Ανάκτηση Πληροφορίας

Math 6 SL Probability Distributions Practice Test Mark Scheme

Instruction Execution Times

Inverse trigonometric functions & General Solution of Trigonometric Equations

C.S. 430 Assignment 6, Sample Solutions

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΕΙΡΑΙΑ ΤΜΗΜΑ ΝΑΥΤΙΛΙΑΚΩΝ ΣΠΟΥΔΩΝ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΣΤΗΝ ΝΑΥΤΙΛΙΑ

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Numerical Analysis FMN011

Μεταπτυχιακή διατριβή. Ανδρέας Παπαευσταθίου

Example Sheet 3 Solutions

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

ΕΠΙΧΕΙΡΗΣΙΑΚΗ ΑΛΛΗΛΟΓΡΑΦΙΑ ΚΑΙ ΕΠΙΚΟΙΝΩΝΙΑ ΣΤΗΝ ΑΓΓΛΙΚΗ ΓΛΩΣΣΑ

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

«Συμπεριφορά μαθητών δευτεροβάθμιας εκπαίδευσης ως προς την κατανάλωση τροφίμων στο σχολείο»

ΠΕΡΙΕΧΟΜΕΝΑ. Μάρκετινγκ Αθλητικών Τουριστικών Προορισμών 1

Τελική Εξέταση =1 = 0. a b c. Τµήµα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. HMY 626 Επεξεργασία Εικόνας

Special edition of the Technical Chamber of Greece on Video Conference Services on the Internet, 2000 NUTWBCAM

Χρήση συστημάτων πληροφορικής στην οδική υποδομή

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Solutions to the Schrodinger equation atomic orbitals. Ψ 1 s Ψ 2 s Ψ 2 px Ψ 2 py Ψ 2 pz

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Μελέτη της έκφρασης του ογκοκατασταλτικού γονιδίου Cyld στον καρκίνο του μαστού

Τ.Ε.Ι. ΔΥΤΙΚΗΣ ΜΑΚΕΔΟΝΙΑΣ ΠΑΡΑΡΤΗΜΑ ΚΑΣΤΟΡΙΑΣ ΤΜΗΜΑ ΔΗΜΟΣΙΩΝ ΣΧΕΣΕΩΝ & ΕΠΙΚΟΙΝΩΝΙΑΣ

ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΤΜΗΜΑ ΟΔΟΝΤΙΑΤΡΙΚΗΣ ΕΡΓΑΣΤΗΡΙΟ ΟΔΟΝΤΙΚΗΣ ΚΑΙ ΑΝΩΤΕΡΑΣ ΠΡΟΣΘΕΤΙΚΗΣ

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education

Statistical Inference I Locally most powerful tests

ICTR 2017 Congress evaluation A. General assessment

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

Επίλυση Προβλήματος σε Προγραμματιστικό Περιβάλλον από Παιδιά Προσχολικής Ηλικίας

ST5224: Advanced Statistical Theory II

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Advanced Subsidiary Unit 1: Understanding and Written Response

ΠΩΣ ΕΠΗΡΕΑΖΕΙ Η ΜΕΡΑ ΤΗΣ ΕΒΔΟΜΑΔΑΣ ΤΙΣ ΑΠΟΔΟΣΕΙΣ ΤΩΝ ΜΕΤΟΧΩΝ ΠΡΙΝ ΚΑΙ ΜΕΤΑ ΤΗΝ ΟΙΚΟΝΟΜΙΚΗ ΚΡΙΣΗ

Finite Field Problems: Solutions

Figure 3 Three observations (Vp, Vs and density isosurfaces) intersecting in the PLF space. Solutions exist at the two indicated points.

Concrete Mathematics Exercises from 30 September 2016

ΚΑΘΟΡΙΣΜΟΣ ΠΑΡΑΓΟΝΤΩΝ ΠΟΥ ΕΠΗΡΕΑΖΟΥΝ ΤΗΝ ΠΑΡΑΓΟΜΕΝΗ ΙΣΧΥ ΣΕ Φ/Β ΠΑΡΚΟ 80KWp

CE 530 Molecular Simulation

ΠΑΝΔΠΗΣΖΜΗΟ ΠΑΣΡΩΝ ΣΜΖΜΑ ΖΛΔΚΣΡΟΛΟΓΩΝ ΜΖΥΑΝΗΚΩΝ ΚΑΗ ΣΔΥΝΟΛΟΓΗΑ ΤΠΟΛΟΓΗΣΩΝ ΣΟΜΔΑ ΤΣΖΜΑΣΩΝ ΖΛΔΚΣΡΗΚΖ ΔΝΔΡΓΔΗΑ

Strain gauge and rosettes

ΑΚΑ ΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕ ΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ

From the finite to the transfinite: Λµ-terms and streams

Section 1: Listening and responding. Presenter: Niki Farfara MGTAV VCE Seminar 7 August 2016

Surface Mount Multilayer Chip Capacitors for Commodity Solutions

Πώς μπορεί κανείς να έχει έναν διερμηνέα κατά την επίσκεψή του στον Οικογενειακό του Γιατρό στο Ίσλινγκτον Getting an interpreter when you visit your

ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ

Ramin Nobakht. Rockwell Semiconductor Systems. IEEE 802.3ab Interim. San Jose, CA February 12-13, 1998

On a four-dimensional hyperbolic manifold with finite volume

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Nuclear Physics 5. Name: Date: 8 (1)

FOR THE MOMENT, DUE TO ECONOMICAL CRISIS, PRODUCTION OF GICLEE COPIES UNDER REQUEST, IS SUSPENDED.

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

MSM Men who have Sex with Men HIV -

4.6 Autoregressive Moving Average Model ARMA(1,1)

(1) Describe the process by which mercury atoms become excited in a fluorescent tube (3)

Repeated measures Επαναληπτικές μετρήσεις

Example 1: THE ELECTRIC DIPOLE

Μηχανισμοί πρόβλεψης προσήμων σε προσημασμένα μοντέλα κοινωνικών δικτύων ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΚΡΗΤΗΣ ΣΧΟΛΗ ΔΙΟΙΚΗΣΗΣ ΚΑΙ ΟΙΚΟΝΟΜΙΑΣ (ΣΔΟ) ΤΜΗΜΑ ΛΟΓΙΣΤΙΚΗΣ ΚΑΙ ΧΡΗΜΑΤΟΟΙΚΟΝΟΜΙΚΗΣ

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΓΕΩΤΕΧΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΚΑΙ ΔΙΑΧΕΙΡΙΣΗΣ ΠΕΡΙΒΑΛΛΟΝΤΟΣ. Πτυχιακή εργασία

Door Hinge replacement (Rear Left Door)

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Προσομοίωση BP με το Bizagi Modeler

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Transcript:

Tilapia Genome Status Analysis of Broad assembly v. 1 February, 2011 Tom Kocher, Matt Conte, Lucile Soler University of Maryland and CIRAD

Genome Browser We have loaded the Broad assembly into Gbrowse: Bouillabase.org or http://cichlid.umd.edu/cichlidlabs/kocherlab/genomebrowsers.html We are adding a variety of annotation, and mapping of read data, to the browser tracks

Assembly stats (Broad v.1) 77,754 contigs (N 50 = 29,493) (>1kb?) 5,900 scaffolds (>1kb) scaffold length including gaps: 924,023,520 (N 50 = 2,800,770) scaffold length excluding gaps: 816,089,150 (N 50 = 2,757,744)

Good continuity Tilapia opsin BAC compared to Broad assembly v.1

Base accuracy Average depth of coverage is high, should give very high accuracy Have not yet made detailed comparisons to gold standards (Sanger BAC ends) Expect base accuracy >>99% (Q??)

Gaps The current 927Mb assembly contains 112Mb of gaps (12.0%)

Gaps

Current Assembly % total assembly cumulative length #scaffolds scaffold length 0.1 92,772,591 8 9,361,541 0.2 185,545,182 21 5,793,747 0.3 278,317,774 39 4,605,129 0.4 371,090,365 62 3,703,282 0.5 463,862,956 90 2,801,867 0.6 556,635,547 127 2,265,546 0.7 649,408,138 175 1,590,368 0.8 742,180,730 250 952,115 0.90 834,953,321 403 372,320 0.91 844,230,580 430 330,839 0.92 853,507,839 459 292,362 0.93 862,785,098 495 240,771 0.94 872,062,357 536 209,527 0.95 881,339,616 584 168,139 0.96 890,616,876 645 129,997 0.97 899,894,135 730 88,114 0.98 909,171,394 877 43,542 0.99 918,448,653 1,518 6,459 1.00 927,725,912 5,900 1,000

Constructing a golden path with RH map of Galibert et al. A golden path is the ordered sequence of assembly scaffolds along each chromosome. Assembly scaffolds that cannot be placed in the golden are lumped together in the unordered chromosome.

Golden path total length of genome total length of golden path ratio 927,725,912 657,266,498 0.708 total number of scaffolds number of scaffolds in GP ratio 5,899 236 0.040

Karyotype of O. niloticus O. niloticus FISH with repeat-containing BAC (Ferreira et al. 2010). Note the high density of repeats on chr4 (LG3 in the genetic map).

Golden path Expect average of 50Mb/chr Most have 25-30Mb LG3 (largest chromosome) has only 17Mb LG7 has 53Mb? LG total length nb scaffold scaffold LG1 31,194,087 8 LG2 25,304,446 6 LG3 17,278,939 9 LG4 26,483,370 8 *RH LG5 27,331,326 8 LG6 27,289,678 14 LG7 53,105,870 14 RH LG8-24 29,449,623 10 LG9 19,809,448 4 RH LG10 10,773,098 5 LG11 31,190,552 13 * LG12 34,678,406 14 RH LG13 31,740,381 8 RH LG14 30,266,167 16 RH LG15 26,979,052 10 LG16-21 28,301,266 11 LG17 23,955,958 8 LG18 26,197,606 8 LG19 29,056,773 10 LG20 31,469,886 9 LG22 20,073,157 10 RH LG23 18,956,114 5 ORPHANS 56,381,295 29

Golden path About 60% of markers are found in the golden path This is the expected value if the golden path contains 70% of the genome and the assembly has 10% gaps LG number of marker number of marker matching ratio LG1 45 30 0.667 LG2 45 28 0.622 LG3 35 16 0.457 LG4 54 34 0.630 LG5 54 42 0.778 LG6 60 37 0.617 LG7 79 45 0.570 LG8-24 60 36 0.600 LG9 36 19 0.528 LG10 23 16 0.696 LG11 48 30 0.625 LG12 79 51 0.646 LG13 48 27 0.563 LG14 53 35 0.660 LG15 47 36 0.766 LG16-21 54 34 0.630 LG17 50 26 0.520 LG18 54 37 0.685 LG19 53 35 0.660 LG20 55 37 0.673 LG22 43 25 0.581 LG23 36 19 0.528 ORPHANS 126 75 0.595

LG1 Scaffold_94 Scaffold_222 Scaffold_287 Scaffold_4098 Scaffold_40 total length nb LG scaffold scaffold LG1 31194087 8 Scaffold_17 number number of marker LG of marker matching ratio LG1 45 30 0.667 Scaffold_154 Scaffold_0 A good example/result!

LG3 Scaffold_414 Scaffold_56 total length LG scaffold nb scaffold LG3 17,278,939 9 Scaffold_75 Scaffold_88 number of LG number of marker marker matching ratio LG3 35 16 0.457 Scaffold_357 Scaffold_116 Scaffold_297 Scaffold_341 Not so good relatively little of this large chromosome is represented.

LG7 total length LG scaffold nb scaffold LG7 53105870 14 RH Scaffold_52 Scaffold_9 Scaffold_138 Scaffold_270 Scaffold_8 number of LG number of marker marker matching ratio LG7 79 45 0.57 S182 S142 Scaffold_30 Scaffold_166 Scaffold_103 Scaffold_78 Scaffold_171 Scaffold_6 Problematic RH map and assembly scaffolds disagree. Scaffold_291

Fish genome assemblies Species Chrom # Genome size (Mb) Mb unordered % unordered Tilapia v. 1 (2011) 22 927 270 29.2 Stickleback (2006) 22 463 62 13.5 Medaka (2005) 24 869 145 16.7 Tetraodon v.8 (2007) 21 358 118 32.8 Tetraodon v.7 (2004) 21 402 185 45.9

Probable misassembly Scaffold 21 has multiple hits to RH markers on both: tilapia LG4 (stickleback chr 11) and tilapia LG11 (stickleback chr 20

Breakpoint in Scaffold 21 Not much support here Good support for 25kb gap

Not much support in 40kb data Closer look at Scaffold 21

Not much support in U Md 5kb data Detail view Scaffold 21

BAC scaffolding Type 1 only 1 end available, or mapping to the assembly after repeat masking. Type 2 both ends map to the same scaffold, at an appropriate distance/orientation (the vast majority) Type 3 - both ends map to the same scaffold, but at the wrong distance (almost none of these). Type 4 the two ends map to different scaffolds. These have the potential to help link scaffolds.

Scaffolding with BACs

BAC scaffolding Approximately 1,000 type 4 BACs map within 200kb of the end of a scaffold. These are available for the next round of scaffolding, but seem to be too few to be of much help.

40kb libraries SRR071595 low complexity SRR071611 good complexity

Question Would additional 40kb libraries help the scaffolding effort?

Recent duplications Recently duplicated genes may have important roles, but are poorly assembled by the WGS approach. We have been studying the vasa gene, in order to identify promoter sequences to create a transgenic expressing GFP in the primordial germ cells and gonad.

BAC contigs Most vertebrates have a single copy of the vasa gene PCR screening of the Katagiri/UNH tilapia BAC library identified two FPC contigs of BACs containing vasa genes Contig992 T4-2R 71CD02* T4-2R 52B(B02) T4-2R 53B(A09) Contig542 T4-2R 72AB04* T4-2R 05B(B04) T3-2R 72CG(01) We sequenced (by 454) and assembled the two * BACs

Two vasa BACs sequenced (by 454) ~ 1% seq divergence

71H03 vs Broad v.1 BAC seq contigs contig1 scaffold_19 scaffold_19 BAC seq contigs of 71H03 are on Scaffold_19, and the organization of the contings was confirmed. However, the seqence in vasa gene loci of 71H03 was not completely supported by the Broad sequence. vasa gene locus scaffold_19

72C07 vs Broad v.1 BAC seq contigs contig1 scaffold_11 scaffold_11 BAC seq contigs of 72C07 are on Scaffold_11, and the organization of the contings was confirmed. However, the seqence in vasa gene loci of 72C07 was not completely supported by the Broad sequence. vasa gene locus scaffold_11

Scaffolds 11 & 19 are incomplete 71H03 72C07 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 * 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 scaffold_11 + + + + + + + + + + + scaffold_19 + + + + + + + + + Scaffold_11 possesses exons 12-22 of vasa gene Scaffold_19 possesses exons 4 13, except for 12, of vasa gene.

vasa cdna 3 vasa scaffolds! scaffold_19 scaffold_11 scaffold_160

Three copies in Broad v.1, each assembled scaffold incomplete Original location Scaffold 11 Scaffold 160 Scaffold 19 Koji Fujimura, in prep

Annotation U Maryland Maker annotation running Should be available after 2 more weeks of computation Expected results on next slide

Conclusions Assembly has high continuity and base accuracy, similar to previous, Sanger-based, fish genomes. Spanned gaps represent 10% of the current assembly. Reasons for gaps not yet known. It would be desirable to fill these. About 25% of the assembly is not yet in a golden path. Placing the top 1500 scaffolds >6kb would incorporate 99% of the assembly into the golden path. At least one probably misassembly has been identified, and should be scrutinized for general lessons.