hrcak mascot   Srce   HID

Food Technology and Biotechnology, Vol.55 No.2 Lipanj 2017.

Kratko priopćenje

MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses

Ranko Gacesa ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Jurica Zucko   ORCID icon ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Solveig K. Petursdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Elisabet Eik Gudmundsdottir ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Olafur H. Fridjonsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Janko Diminic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Paul F. Long ; Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
John Cullum ; Department of Genetics, University of Kaiserslautern, Postfach 3049, DE-67653 Kaiserslautern, Germany
Daslav Hranueli ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Gudmundur O. Hreggvidsson ; Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Antonio Starcevic ; SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia

Puni tekst: engleski, pdf (461 KB) str. 251-257 preuzimanja: 38* citiraj
Gacesa, R., Zucko, J., Petursdottir, S.K., Gudmundsdottir, E.E., Fridjonsson, O.H., Diminic, J., Long, P.F., Cullum, J., Hranueli, D., Hreggvidsson, G.O., Starcevic, A. (2017). MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses. Food Technology and Biotechnology, 55(2). doi:10.17113/ftb.

Rad u XML formatu

The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel ‘functional’ assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

Ključne riječi
bioprospecting; carbohydrate-modifying enzymes; DNA assembly

Hrčak ID: 183073




Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 2012;40:e9. DOI: PubMed:


Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. DOI: PubMed:


Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, et al. GenBank. Nucleic Acids Res. 2015;43:D30–5. DOI: PubMed:


Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–702. DOI: PubMed:


Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37:6643–54. DOI: PubMed:


Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40:D109–14. DOI: PubMed:


Eddy SR. A new generation of homology search tools based on probabilistic inference. Genome Inform. 2009;23:205–11. DOI: PubMed:


Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2016;44:D279–85. DOI: PubMed:


Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 2009;37:D233–8. DOI: PubMed:


Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009;6:673–6. DOI: PubMed:


Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. DOI: PubMed:


ZODB – a native object database for Python. Richardson, TX, USA: Zope Foundation Inc.; 2013. Available from:


Enterprise search platform Lucene/Solr. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from:


Tomcat web server. Wakefield, MA, USA: The Apache Software Foundation; 2013. Available from: http://tomcat.apache. org/.


HTML – the language for building web pages; 2017. Available from:


JavaServer Pages (JSP) tutorial; 2017. Available from:


Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. DOI: PubMed:


Eddy SR. Accelerated profile HMM searches. PLOS Comput Biol. 2011;7:e1002195. DOI: PubMed:


Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a web browser. BMC Bioinformatics. 2011;12:385. DOI: PubMed:


Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31:3497–500. DOI: PubMed:


Yin Y, Mao X, Yang JC, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–51. DOI: PubMed:


Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–7. DOI: PubMed:


Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, et al. Rfam: Wikipedia, clans and the ‘decimal’ release. Nucleic Acids Res. 2011;39:D141–5. DOI: PubMed:


Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95:315–27. DOI: PubMed:


Kelley DR, Salzberg SL. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics. 2010;11:544. DOI: PubMed:


Barbe V, Bouzon M, Mangenot S, Badet B, Poulain J, Segurens B, et al. Complete genome sequence of Streptomyces cattleya NRRL 8057, a producer of antibiotics and fluorometabolites. J Bacteriol. 2011;193:5055–6. DOI: PubMed:


Dunlap WC, Starcevic A, Baranasic D, Diminic J, Zucko J, Gacesa R, et al. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase – an open access and searchable database of a coral genome. BMC Genomics. 2013;14:509. DOI: PubMed:


Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–6. DOI: PubMed:


Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–9. DOI: PubMed:


Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9. DOI: PubMed:


Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239. DOI: PubMed:


Posjeta: 61 *