hrcak mascot   Srce   HID

Food Technology and Biotechnology, Vol.55 No.2 Lipanj 2017.

Kratko priopćenje
https://doi.org/10.17113/ftb.55.02.17.4749

MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses

Ranko Gacesa ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Jurica Zucko   ORCID icon orcid.org/0000-0001-7782-6503 ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Solveig K. Petursdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Elisabet Eik Gudmundsdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Olafur H. Fridjonsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Janko Diminic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Paul F. Long ; Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
John Cullum ; Department of Genetics, University of Kaiserslautern, Postfach 3049, DE-67653 Kaiserslautern, Germany
Daslav Hranueli ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Gudmundur O. Hreggvidsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Antonio Starcevic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia

Puni tekst: engleski, pdf (461 KB) str. 251-257 preuzimanja: 38* citiraj
APA
Gacesa, R., Zucko, J., Petursdottir, S.K., Gudmundsdottir, E.E., Fridjonsson, O.H., Diminic, J., Long, P.F., Cullum, J., Hranueli, D., Hreggvidsson, G.O., Starcevic, A. (2017). MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses. Food Technology and Biotechnology, 55(2). doi:10.17113/ftb.55.02.17.4749

Rad u XML formatu

Sažetak
The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel ‘functional’ assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

Ključne riječi
bioprospecting; carbohydrate-modifying enzymes; DNA assembly

Hrčak ID: 183073

URI
https://hrcak.srce.hr/183073

Reference

1 

Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 2012;40:e9. DOI: http://dx.doi.org/10.1093/nar/gkr1067 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22102569

2 

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. DOI: http://dx.doi.org/10.1016/S0022-2836(05)80360-2 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/2231712

3 

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2015;43:D30–5. DOI: http://dx.doi.org/10.1093/nar/gku1216 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/25414350

4 

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–702. DOI: http://dx.doi.org/10.1093/nar/gki866 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/16214803

5 

Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37:6643–54. DOI: http://dx.doi.org/10.1093/nar/gkp698 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19762480

6 

Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40:D109–14. DOI: http://dx.doi.org/10.1093/nar/gkr988 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22080510

7 

Eddy SR. A new generation of homology search tools based on probabilistic inference. Genome Inform. 2009;23:205–11. DOI: http://dx.doi.org/10.1142/9781848165632_0019 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20180275

8 

Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2016;44:D279–85. DOI: http://dx.doi.org/10.1093/nar/gkv1344 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/26673716

9 

Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 2009;37:D233–8. DOI: http://dx.doi.org/10.1093/nar/gkn663 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18838391

10 

Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009;6:673–6. DOI: http://dx.doi.org/10.1038/nmeth.1358 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19648916

11 

Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. DOI: http://dx.doi.org/10.1186/1471-2105-9-386 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18803844

12 

ZODB – a native object database for Python. Richardson, TX, USA: Zope Foundation Inc.; 2013. Available from: http://www.zodb.org/.

13 

Enterprise search platform Lucene/Solr. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from: http://lucene.apache.org/solr/.

14 

Tomcat web server. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from: http://tomcat.apache. org/.

15 

HTML – the language for building web pages; 2017. Available from: http://www.w3schools.com/default.asp.

16 

JavaServer Pages (JSP) tutorial; 2017. Available from: http://www.jsptut.com/.

17 

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. DOI: http://dx.doi.org/10.1186/1471-2105-10-421 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20003500

18 

Eddy SR. Accelerated profile HMM searches. PLOS Comput Biol. 2011;7:e1002195. DOI: http://dx.doi.org/10.1371/journal.pcbi.1002195 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22039361

19 

Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a web browser. BMC Bioinformatics. 2011;12:385. DOI: http://dx.doi.org/10.1186/1471-2105-12-385 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21961884

20 

Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31:3497–500. DOI: http://dx.doi.org/10.1093/nar/gkg500 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/12824352

21 

Yin Y, Mao X, Yang JC, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–51. DOI: http://dx.doi.org/10.1093/nar/gks479 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/22645317

22 

Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–7. DOI: http://dx.doi.org/10.1093/bioinformatics/btp157 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/19307242

23 

Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, et al. Rfam: Wikipedia, clans and the ‘decimal’ release. Nucleic Acids Res. 2011;39:D141–5. DOI: http://dx.doi.org/10.1093/nar/gkq1129 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21062808

24 

Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95:315–27. DOI: http://dx.doi.org/10.1016/j.ygeno.2010.03.001 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/20211242

25 

Kelley DR, Salzberg SL. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics. 2010;11:544. DOI: http://dx.doi.org/10.1186/1471-2105-11-544 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21044341

26 

Barbe V, Bouzon M, Mangenot S, Badet B, Poulain J, Segurens B, et al. Complete genome sequence of Streptomyces cattleya NRRL 8057, a producer of antibiotics and fluorometabolites. J Bacteriol. 2011;193:5055–6. DOI: http://dx.doi.org/10.1128/JB.05583-11 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/21868806

27 

Dunlap WC, Starcevic A, Baranasic D, Diminic J, Zucko J, Gacesa R, et al. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase – an open access and searchable database of a coral genome. BMC Genomics. 2013;14:509. DOI: http://dx.doi.org/10.1186/1471-2164-14-509 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/23889801

28 

Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–6. DOI: http://dx.doi.org/10.1093/nar/28.1.33 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/10592175

29 

Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–9. DOI: http://dx.doi.org/10.1038/nature07517 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18987734

30 

Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9. DOI: http://dx.doi.org/10.1038/nmeth.2474 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/23644548

31 

Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239. DOI: http://dx.doi.org/10.1186/s13059-016-1103-0 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/27887629

[hrvatski]

Posjeta: 61 *