hrcak mascot   Srce   HID

Food Technology and Biotechnology, Vol.55 No.2 Lipanj 2017.

Kratko priopćenje
https://doi.org/10.17113/ftb.55.02.17.4749

MEGGASENSE – pretraživač (meta)genomski anotiranih sekvencija pomoću govornog jezika – platforma za izradu bioloških skladišta podataka

Ranko Gacesa ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Jurica Zucko   ORCID icon orcid.org/0000-0001-7782-6503 ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Solveig K. Petursdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Elisabet Eik Gudmundsdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Olafur H. Fridjonsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Janko Diminic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Paul F. Long ; Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
John Cullum ; Department of Genetics, University of Kaiserslautern, Postfach 3049, DE-67653 Kaiserslautern, Germany
Daslav Hranueli ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Gudmundur O. Hreggvidsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Antonio Starcevic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia

Puni tekst: engleski, pdf (461 KB) str. 251-257 preuzimanja: 38* citiraj
APA
Gacesa, R., Zucko, J., Petursdottir, S.K., Gudmundsdottir, E.E., Fridjonsson, O.H., Diminic, J., Long, P.F., Cullum, J., Hranueli, D., Hreggvidsson, G.O., Starcevic, A. (2017). MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses. Food Technology and Biotechnology, 55(2). doi:10.17113/ftb.55.02.17.4749

Rad u XML formatu

Sažetak
Platforma MEGGASENSE služi za izradu relacijskih baza podataka koje sadržavaju nukleotidne ili proteinske sekvencije. Osnovna funkcionalna analiza zasniva se na primjeni 14 106 profila skrivenih Markovljevih modela (HMM), temeljenih na sekvencijama dostupnim u bazi podataka KEGG. Pomoću tražilice Solr mogu se zadati napredni upiti u sprezi s implementiranom pretragom BLAST. Osnovne funkcionalnosti platforme omogućile su izradu baze podataka SCATT, temeljene na predviđenom proteomu bakterije Streptomyces cattleya. U radu je opisana implementacija specijalizirane metagenomske baze podataka (AMYLOMICS) za „bioprospecting“ enzima koji modificiraju ugljikohidrate. Uz standardno slaganje očitanih kratkih sljedova DNA, razvijen je funkcionalni postupak pretraživanja HMM profila u očitanim slijedovima DNA prije slaganja. Baza podataka AMYLOMICS sadržava i dodatne HMM profile enzima za modifikaciju ugljikohidrata. U radu je prikazano kako se kombinacijom analiza HMM i BLAST mogu identificirati ciljani geni. Platforma MEGGASENSE upotrijebljena je za izradu raznih proteomskih i metagenomskih baza podataka.

Ključne riječi
„bioprospecting“; enzimi za modifikaciju ugljikohidrata; slaganje DNA

Hrčak ID: 183073

URI
https://hrcak.srce.hr/183073

Reference

1 

Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 2012;40:e9. DOI: http://dx.doi.org/10.1093/nar/gkr1067 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22102569

2 

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. DOI: http://dx.doi.org/10.1016/S0022-2836(05)80360-2 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/2231712

3 

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2015;43:D30–5. DOI: http://dx.doi.org/10.1093/nar/gku1216 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/25414350

4 

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–702. DOI: http://dx.doi.org/10.1093/nar/gki866 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/16214803

5 

Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37:6643–54. DOI: http://dx.doi.org/10.1093/nar/gkp698 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19762480

6 

Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40:D109–14. DOI: http://dx.doi.org/10.1093/nar/gkr988 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22080510

7 

Eddy SR. A new generation of homology search tools based on probabilistic inference. Genome Inform. 2009;23:205–11. DOI: http://dx.doi.org/10.1142/9781848165632_0019 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20180275

8 

Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2016;44:D279–85. DOI: http://dx.doi.org/10.1093/nar/gkv1344 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/26673716

9 

Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 2009;37:D233–8. DOI: http://dx.doi.org/10.1093/nar/gkn663 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18838391

10 

Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009;6:673–6. DOI: http://dx.doi.org/10.1038/nmeth.1358 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19648916

11 

Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. DOI: http://dx.doi.org/10.1186/1471-2105-9-386 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18803844

12 

ZODB – a native object database for Python. Richardson, TX, USA: Zope Foundation Inc.; 2013. Available from: http://www.zodb.org/.

13 

Enterprise search platform Lucene/Solr. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from: http://lucene.apache.org/solr/.

14 

Tomcat web server. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from: http://tomcat.apache. org/.

15 

HTML – the language for building web pages; 2017. Available from: http://www.w3schools.com/default.asp.

16 

JavaServer Pages (JSP) tutorial; 2017. Available from: http://www.jsptut.com/.

17 

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. DOI: http://dx.doi.org/10.1186/1471-2105-10-421 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20003500

18 

Eddy SR. Accelerated profile HMM searches. PLOS Comput Biol. 2011;7:e1002195. DOI: http://dx.doi.org/10.1371/journal.pcbi.1002195 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22039361

19 

Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a web browser. BMC Bioinformatics. 2011;12:385. DOI: http://dx.doi.org/10.1186/1471-2105-12-385 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21961884

20 

Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31:3497–500. DOI: http://dx.doi.org/10.1093/nar/gkg500 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/12824352

21 

Yin Y, Mao X, Yang JC, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–51. DOI: http://dx.doi.org/10.1093/nar/gks479 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22645317

22 

Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–7. DOI: http://dx.doi.org/10.1093/bioinformatics/btp157 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19307242

23 

Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, et al. Rfam: Wikipedia, clans and the ‘decimal’ release. Nucleic Acids Res. 2011;39:D141–5. DOI: http://dx.doi.org/10.1093/nar/gkq1129 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21062808

24 

Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95:315–27. DOI: http://dx.doi.org/10.1016/j.ygeno.2010.03.001 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20211242

25 

Kelley DR, Salzberg SL. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics. 2010;11:544. DOI: http://dx.doi.org/10.1186/1471-2105-11-544 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21044341

26 

Barbe V, Bouzon M, Mangenot S, Badet B, Poulain J, Segurens B, et al. Complete genome sequence of Streptomyces cattleya NRRL 8057, a producer of antibiotics and fluorometabolites. J Bacteriol. 2011;193:5055–6. DOI: http://dx.doi.org/10.1128/JB.05583-11 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21868806

27 

Dunlap WC, Starcevic A, Baranasic D, Diminic J, Zucko J, Gacesa R, et al. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase – an open access and searchable database of a coral genome. BMC Genomics. 2013;14:509. DOI: http://dx.doi.org/10.1186/1471-2164-14-509 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/23889801

28 

Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–6. DOI: http://dx.doi.org/10.1093/nar/28.1.33 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/10592175

29 

Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–9. DOI: http://dx.doi.org/10.1038/nature07517 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18987734

30 

Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9. DOI: http://dx.doi.org/10.1038/nmeth.2474 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/23644548

31 

Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239. DOI: http://dx.doi.org/10.1186/s13059-016-1103-0 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/27887629

[engleski]

Posjeta: 61 *