Introduction
The marine red alga Pyropia yezoensis (Ueda) M.S.Hwang et H.G.Choi is found in intertidal zones, where it faces considerable variability in temperature, water content, and light intensity on a daily basis. It is an economically important seaweed because of its pleasant savory taste and widely-recognized nutritional values (Noda 1993,Wenjuan et al. 2010). Numerous mutants have been created, providing valuable resources for scientific research and for the introduction of new cultivars with higher commercial values (Niwa et al. 1993,Xing and Yusho 1997,Su-Juan et al. 2000,Yan et al. 2000,Yan et al. 2004,Li et al. 2008,Niwa et al. 2009,Niwa 2010,Zhang et al. 2011,Park and Hwang 2014,Ding et al. 2016). The novel phenotypes range from pigmentation mutations to high yield, disease resistance, or heat tolerance. However, there are few studies making an in-depth investigation of the molecular basis underlying these mutants.
Previously, weisolated P. yezoensis 500G (Py500G) which is a mutated strain generated by gamma irradiation (Lee et al. 2019). The strain exhibited enhanced growth and high-temperature tolerance when compared to the wild type. However, the mutant Py500G was not analyzed at a molecular level. One of the biggest challenges in genetic studies of P. yezoensis is the lack of a complete reference genome. It was estimated that its genome is about 260 Mbp in size (Matsuyama-Serisawa et al. 2007). Meanwhile, the current draft genome of P. yezoensis is approximately 43 Mbp and composed of 46,634 contigs (Nakamura et al. 2013). To counter this problem, genome-wide transcriptome profiling and de novo assembly has been extensively employed to investigate multiple characteristics of this species (Nikaido et al. 2000,Asamizu et al. 2003,Xu et al. 2006,Kitade et al. 2008,Liang et al. 2010,Yang et al. 2011,Sun et al. 2015).
With the aim of studying the molecular characteristics of P. yezoensis 500G, we performed transcriptome sequencing using RNA-Seq. RNA-Seq can provide a comprehensive overview of the whole genome expression profile of Py500G. Based on those data, genes that are differentially expressed can be identified to help to determine the mechanism responsible for the Py500G phenotype. The study will not only help to unravel the molecular mechanism of the high growth rate of Py500G but will also provide more comprehensive genomic data for a better understanding of the metabolic processes of P. yezoensis.
Materials and methods
Materials and cultivation condition
Whole gametophytes of Py500G, which was developed in our laboratory, and the original wild type strain PyWT from the Seaweed Research Center (National Fisheries Research and Development Institute, South Korea) were used in the study. Gametophytes were cultivated in modified Grund medium (MGM) (McLachlan 1973) under 80 µmol photons m-2 s-1 and a photoperiod of 10 L:14 D at 10 °C in a growth chamber. Algal cultures were continuously aerated with filter-sterilized air and the medium was changed on a weekly basis.
RNA extraction and sequencing
Total RNA was isolated from the gametophytes of Py500G and PyWT algae using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), following the manufacturer’s instructions. The quality of the processed RNA samples was confirmed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and by agarose gel electrophoresis. Total RNA was treated with DNase I and poly-dT oligonucleotide-coated magnetic beads to elute poly-(A)+ mRNA. Purified mRNA was fragmented using a DNA fragmentation kit (Ambion, Austin, TX, USA) prior to cDNA synthesis. The cleaved mRNA fragments were primed using random-hexamer primers and reverse transcriptase (Invitrogen) was used to synthesize first-strand cDNA. Second-strand cDNA synthesis was performed using RNase H (Invitrogen) and DNA polymerase I (New England Biolabs, Ipswich, MA, USA). Subsequently, end-repair of double-stranded cDNA was performed using T4 DNA polymerase, the Klenow fragment, and T4 polynucleotide kinase (New England Biolabs). The end-repaired cDNA was ligated to the Illumina paired-end (PE) adapter oligonucleotide-mix using T4 DNA ligase (New England Biolabs) at room temperature for 15 min. Suitable fragments were then sequenced in a PE pattern on an Illumina HiSeq 2000 instrument (Illumina, Inc., San Diego, CA, USA). Sequencing data were transformed by base calling into raw reads and stored in fastq format.
Data pre-processing, de novo assembly and redundancy removal
The raw reads were deposited in NCBI Sequence Read Archive under accessions from SRR5891396 to SRR5891400. Read quality was assessed using FastQC 0.11.5 (Andrews 2010). Raw data were first pre-processed to remove adapter sequences and low-quality data using Trimmomatic 0.36 (Bolger et al. 2014). The processed, clean paired-end reads were then used for de novo assembly using the short read assembling program Trinity v2.4.0 (Grabherr et al. 2011) with the default k-mer of 25. The Trinity assembler combined reads to form longer overlapping contigs without gaps. We then used TransDecoder (Haas and Papanicolaou 2016) to identify candidate coding regions within transcript sequences and to eliminate transcripts which open reading frames (ORFs) encoded less than 100 amino acids. Finally, CD-HIT (Li and Godzik 2006,Fu et al. 2012) was used to remove redundancy from the assembly. Specifically, the ORF sequences generated by TransDecoder were clustered into groups of 90% identity using CD-HIT. Transcripts containing the longest ORFs of each cluster were kept.
Functional annotation
To understand the functions of the transcripts in the assembly, we annotated transcripts by following the Trinotate workflow (Bryant et al. 2017) with some modifications. The candidate ORF sequences were used as queries for a homology search against NCBI nr and Swiss-Prot databases (BLASTP, e-value cut-off of 1E-5). BLAST+ program (Camacho et al. 2009) was used to search for homologous sequences in Swiss-Prot and DIAMOND aligner (Buchfink et al. 2015) was used to search the NCBI nr database. Protein domains of transcripts were retrieved from the Pfam database using hmmscan (Finn et al. 2011) (e-value cut-off of 1E-5). Pathway annotation at the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto 2000) was performed using the KEGG Automatic Annotation Server (KAAS), with the Single-directional Best Hit method and the gene data set for eukaryotes. Finally, the stand-alone program, InterProScan (Zdobnov and Apweiler 2001), was used to retrieve gene ontology (GO) terms by scanning query sequences for matches against the InterPro protein signature databases.
Differential expression analysis
Bowtie2 (Langmead and Salzberg 2012) was used to map clean reads from Py500G and PyWT to the assembly. Abundance estimation and differentially expressed gene analysis were performed using RSEM (Li and Dewey 2011) and EdgeR (Robinson et al. 2010), respectively. Transcripts with more than a 2-fold difference in expression between Py500G and PyWT and with a false discovery rate (FDR) of less than 0.05 were considered differentially expressed.
RT-qPCR validation
Total RNA was extracted from Py500G and PyWT gametophyte thalli after 4 weeks of culturing. The algae were powdered in liquid nitrogen and RNA was extracted using a RNeasy Mini Kit (Qiagen, Hilden, Germany) and DNase I treatment. cDNA was synthesized using a Transcriptor First Strand cDNA Synthesis Kit (Roche, Basel, Switzerland) with oligo-dT primers.
qRT-PCR was performed using a SYBR Premix Ex Taq™ II Kit (TaKaRa, Kusatsu, Japan) in a 20 μL reaction volume, containing 10 μL of 2×SYBR Green Mastermix (TaKaRa), 2 μL of forward and reverse primers, 1 μL of cDNA, and 7 μL of ddH2O. Reactions were performed using the following program: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s, 55 °C for 20 s, 72 °C for 20 s, and with reading fluorescence signal detection, and then 1 cycle of 95 °C for 15 s, 60 °C for 60 s, and 95 °C for 15 s. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as a control RNA (Wu et al. 2013). The qPCR primers used in this study are included in On-line Suppl. Tab. 1.
Results
De novo sequencing and assembly of P. yezoensis transcriptome
The transcriptome of two P. yezoensis strains, Py500G and PyWT, were sequenced using an Illumina HiSeq 2000 Sequencing instrument. RNA sequencing of gametophyte thalli generated 56,712,121 and 110,398,775 paired-end reads for Py500G and PyWT, respectively. After preprocessing to remove low-quality reads and adapter sequences, 87.84% of the raw data remained for de novo assembly and subsequent analysis (Tab. 1).
Sample ID | Raw reads | Clean reads | GC% | |
---|---|---|---|---|
PyWT | C1 | 37,089,092 | 31,978,088 | 63 |
C2 | 34,298,948 | 29,701,219 | 63 | |
C3 | 39,010,735 | 33,847,405 | 63 | |
Py500G | M1 | 37,809,087 | 34,155,831 | 63 |
M2 | 18,903,034 | 16,885,815 | 63 |
De novo transcriptome assembly was performed using Trinity. Since this new assembly contained a large quantity of redundant sequences, which can hinder downstream comparative analysis (Ono et al. 2015), we clustered peptides translated from the Trinity transcript ORFs into groups of 90% and kept only those transcripts encoding the longest peptides in each group. As a result, we obtained 19,441 non-redundant transcripts in the assembly with an average GC content of 67.54% (Tab. 2).
Functional annotation
Functional annotation facilitates downstream analyses and accelerates the discovery of new genes. The numbers of transcripts annotated in each database are listed inTable 3. We annotated transcripts based on NCBI nr, Swiss-Prot, Pfam, KEGG, GO, and KOG databases. More than 77.5% of total transcripts were annotated in at least one database. 98.8% of annotated transcripts had homologous sequences in NCBI nr, of which 68% were from Porphyra umbilicalis Kützing (Fig. 1A). In contrast, only 46.7% of annotated transcripts had homologous sequences in the manually curated Swiss-Prot database. KEGG pathway annotation provides further understanding of the biological functions of transcripts (Fig. 1B). This analysis revealed 2190 KEGG orthology (KO) identifiers associated with 3,104 transcripts in the assembly. Finally, based on their GO annotations, transcripts were classified into Cellular Component, Molecular Functions, or Biological Process groups (Fig. 1C).
Differential expression analysis
We mapped clean reads to the assembly and then estimated gene abundance for each sample. Using the log2-transformed CPM (counts per million) data, we examined the correlation between pairs of samples by calculating Pearson’s correlation coefficients. All samples were highly correlated with each other as the Pearson correlation coefficients (r) ranged from 0.97 to 0.99 across all pair-wise comparisons (On-line Suppl. Fig. 1).
With an FDR cut-off of 0.05, a total of 454 transcripts were identified as differentially expressed transcripts between Py500G and PyWT. These consisted of 192 up-regulated and 262 down-regulated transcripts in Py500G. Out of these differentially expressed transcripts, only 199 were annotated in at least one database and not recorded as “hypothetical protein” or “unknown protein”. The list of these annotated transcripts is included in On-line Suppl. Tab. 2.
Out of 199 annotated DEGs, we identified a list of genes from groups of functions well-known for their association with growth and development (Tab. 4). These included genes involved in photosynthesis, nitrogen uptake and assimilation, and cellular homeostasis. Furthermore, based on KEGG pathway annotations and manual curation, we also demonstrated the interaction among these candidate transcripts inFig. 2.
RT-qPCR validation
To confirm the expression of transcripts identified by RNA-seq data, we randomly chose 5 differentially expressed transcripts for RT-qPCR validation. One of these transcripts contained 2 different ORFs, which were ACSF and CHLB. Of these 5 transcripts, 3 showed the same pattern of expression as in RNA-seq data (Fig. 3). The transcript containing 2 ORFs also showed consistent results between RT-qPCR and RNA-seq analyses. However, the 2 ORFs showed different levels of expression, despite belonging to the same transcript. This may be the result of different qPCR primer efficiencies.
Discussion
The aim of this study was to identify candidate genes for further study of the mechanisms underlying the high-growth phenotype of P. yezoensis 500G. We approached this by investigating the differences in gene expression between Py500G and PyWT under normal culture conditions, using RNA-seq. In total, 454 transcripts were differentially expressed between Py500G and PyWT. These transcripts belonged to various functional groups and were all potentially involved in determining the high growth rate of Py500G. Further studies focusing on these candidates, especially on the previously uncharacterized genes, may reveal novel molecular mechanisms of the valuable feature in red algae.
Three plastid transcripts involved in photosynthesis were up-regulated in Py500G. The first transcript was PSBB, which encodes one of the photosystem II components. The second transcript was ATPF1A, which encodes the CF1 alpha subunit of chloroplast ATP synthase. The final transcript contained 2 ORFs, ACSF and CHLB, both of which are involved in chlorophyll biosynthesis. The protein encoded by ACSF catalyzes the formation of the isocyclic ring of chlorophyll. Meanwhile, CHLB encodes one of the subunits of light-independent protochlorophyllide oxido reductase (DPOR), which allows the efficient synthesis of chlorophyll in the dark or under low light conditions (Shui et al. 2009). ACSF and CHLB are not only found in P. yezoensis, but are also located adjacent to each other in the plastid genomes of other Pyropia species (Wang et al. 2013,Hughey et al. 2014). Unlike the situation with LPOR, which is a light-dependent POR, few studies have been published on the regulation of DPOR in algae. However, the high levels of expression of the transcript containing ACSF and CHLB were expected to promote chlorophyll synthesis, and along with the up-regulation of PSBB and ATPF1A, subsequently increase photosynthesis in Py500G.
One transcript involved in photosynthesis was found to be down-regulated in Py500G. It was annotated as a phycobilisome linker protein. In contrast with the 2 photosynthesis transcripts mentioned above, which shared 100% sequence identity with previously reported P. yezoensis genes, this transcript showed low homology. In fact, the most homologous sequence in P. yezoensis was a plastid phycobilisome linker protein (accession: YP_536944.1) with identity as low as 28%. However, there was a transcript in the mapping reference that shared 100% identity with YP_536944.1, but this transcript was not differentially regulated.
Dissolved inorganic nitrogen in seawater includes nitrate, nitrite, and ammonium and is a key factor for the growth, development, and quality of P. yezoensis thalli (Kakinuma et al. 2017). Two proteins that play crucial roles in nitrogen uptake, ammonium transporter (AMT) and nitrate transporter (NRT) were transcriptionally up-regulated in Py500G. The amt transcript was identical with the reported PyAMT1 in P. yezoensis, which is highly regulated in response to external/internal N-status (Kakinuma et al. 2017). There were up to 4 nitrate transporter transcripts that shared 94 to 99% sequence identity with the PyNRT2 gene, which has also been previously reported in P. yezoensis (Kakinuma et al. 2008). These 4 transcripts were not full-length, were not designated as isoforms of each other by Trinity, and 3 of them overlapped. Additionally, in the same study, the authors used Southern blotting analysis to show that there is only one NRT2-encoding gene in the P. yezoensis genome. Hence, these 4 transcripts may be transcribed from one gene encoding for PyNRT2, which is responsible for high-affinity nitrate transport across the plasma membrane. Interestingly, besides the two nitrogen transporters, we also identified another up-regulated transcript that encodes a key enzyme in nitrogen assimilation. Nitrate reductase catalyzes the reduction of nitrate to nitrite, consuming NADPH as the reducing equivalent, in the rate-limiting step of nitrate assimilation in plants and algae (Sakihama et al. 2002). All these results indicated that Py500G had a higher rate of nitrogen uptake and assimilation than PyWT. Besides, there is no report of nitrate reductase in P. yezoensis, except for a study on enzyme isolation and purification (Nakamura and Ikawa 1993). The nucleotide sequence of nitrate reductase identified in this study will be valuable for further studies on nitrate uptake and assimilation in red algae.
To grow and develop, especially in a high-salt environment like seawater, it is critical for P. yezoensis to establish and maintain ion and pH homeostasis in all cellular compartments. To accomplish this, cells need to activate various types of membrane transporters, of which proton, sodium, and potassium transporters are the most important. In Py500G, 3 transcripts encoding a sodium pump (KPA), a sodium/proton antiporter (NHD), and a proton-pumping inorganic pyrophosphatase (VPPA) were found to be up-regulated. These 3 transcripts shared 99 – 100% identity with previously identified genes or gene fragments of P. yezoensis, i.e. PyKPA2 (Uji et al. 2012a), PyNhaD (Uji et al. 2012b), and a 639-bp mRNA from the VPPA gene (Pérez-Castiñeira et al. 2002). Both PyKPA2 and PyNhaD have previously been shown to be expressed in gametophyte thalli (Uji et al. 2012a,b), which is consistent with our study. As for the VPPA gene, our transcriptome data showed that it was a putative full-length transcript, with an ORF expected to encode a 1079-aa protein.
Other functional groups were also identified from the list of 199 DEGs (Tab. 5). However, transcripts belonging to these functions did not show a consistent trend of expression or were poorly characterized in the literature. It was challenging to assign to these groups their exact biological significance.
Transposable elements (TEs): Various TEs were found to be up-regulated in Py500G. Homology search showed 2 of the total 10 transcripts exhibited 37% identity to a previously characterized retrotransposon in P. yezoensis (Peddigari et al. 2008). It is unclear what the significance of the high expression of TEs in Py500G is. TEs make a great contribution to the genomes of red algae. It was estimated that 43.5% of genome of Porphyra umbilicalis consists of TEs (Lee et al. 2018). This number was even higher in Chondrus crispus where TEs accounted for up to two third of its genome (Collén et al. 2013). The presence of various TEs in the DEGs list might partly be because of the high proportion of TEs in the genome.
ABC transporters: ABC transporter families are one of the largest groups of transporter in plants and involved in various cellular processes, e.g. detoxification, pathogen response, surface lipid deposition, etc. (Kang et al. 2011). It was also shown that many of the Arabidopsis ABC transporter sequences were present in red-alga genomes (Schulz and Kolukisaoglu 2006). In Py500G, 3 sequences of ABC transporters were differentially expressed: one up-regulated transcript AB7G and 2 down-regulated AB16G and AB27G. However, little is known about the functional characteristics of these 3 transporters in marine algae.
Protein kinases: Kinases constitute a very large group of protein families and are crucial components of different biological pathways. 7 transcripts encoding for protein kinases were differentially expressed. Most of these transcripts were not found in Pyropia before, except for a type II topoisomerase that showed from 90 to 100% identity with five Pyropia algae species: P. tenera, P. onoi, P. suborbiculata, P. haitanensis and P. dentata (Shimomura et al. 2000).