Characterization of the first mitogenomes of the smallest fish in the world, Paedocypris progenetica, from peat swamp of Peninsular Malaysia, Selangor, and Perak
Article information
Abstract
The two complete mitochondrial genomes (mitogenomes) of Paedocypris progenetica, the smallest fish in the world which belonged to the Cyprinidae family, were sequenced and assembled. The circular DNA molecules of mitogenomes P1-P. progenetica and S3-P. progenetica were 16,827 and 16,616 bp in length, respectively, and encoded 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region. The gene arrangements of P. progenetica were identical to those of other Paedocypris species. BLAST and phylogenetic analyses revealed variations in the mitogenome sequences of two Paedocypris species from Perak and Selangor. The circular DNA molecule of P. progenetica yield a standard vertebrate gene arrangement and an overall nucleotide composition of A 33.0%, T 27.2%, C 23.5%, and G 15.5%. The overall AT content of this species was consistent with that of other species in other genera. The negative GC-skew and positive AT-skew of the control region in P. progenetica indicated rich genetic variability and AT nucleotide bias, respectively. The results of this study provide genomic variation information and enhance the understanding of the mitogenome of P. progenetica. They could later deliver highly valuable new insight into data for phylogenetic analysis and population genetics.
Introduction
The Paedocypris populations are rapidly declining worldwide due to anthropogenic and environmental actions that pose a threat to their survival [1]. According to Sam et al. [2], the evolution of small sizes, or miniaturization, is extensively seen in vertebrate species and is most commonly documented in amphibians and fishes. Southeast Asia harbors highly acidic blackwater peat swamps that serve as habitats for miniature fish, which are nearly all endemic to these habitats. The features of miniature phenotypes exhibit morphological novelty and increased morphological variability and are mostly unique combinations of ancestral phenotypes that are derived through structural simplification and reduction [3]. Interestingly, the smallest fish in the world, Paedocypris progenetica, is found in Peninsular Malaysia. However, the lack of its genomic data in GenBank could hinder the extensive study of this remarkable species. The mitochondrial genome (mitogenome) contains multiple genes that are noteworthy for ecological and evolutionary studies to investigate the phylogeny and biodiversity of complex species by using high-throughput sequencing technologies [4]. Hence, this study provided the whole mitogenome of P. progenetica from Peninsular Malaysia for the first time.
Methods
DNA sampling and sequencing
The samples of P. progenetica were collected from North Peat Swamp Selangor (3.39ʹN, 101.15ʹʹE) and Pondok Tanjung Perak (5.04ʹN, 100.4ʹʹE), Peninsular Malaysia in February 2021. Genomic DNA was extracted from the tissue of P. progenetica specimens by using a ReliaPrep gDNA Tissue Miniprep system (Promega, Madison, WI, USA), fragmented with a Bioruptor system, and the remaining tissue is currently deposited at University Putra Malaysia (UPM). The library was prepared by using a NEBNext Ultra II DNA Library Prep Kit for Illumina in accordance with the manufacturer’s protocol. The sample was then sequenced by using an Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA) with 150 paired-end modes (PE150) [5].
Mitogenome assembly, annotation, and sequence analysis
Sequencing adapters, low-quality stretches, and leading/tailing Ns were trimmed from the raw reads of the sequences by using AdapterRemoval V2.2.2 [6]. Forward and reverse reads were interleaved into single file and the assembly were carried out using two different softwares, NOVOPlasty v4.2 [5] and Megahit v1.2.9 [7], both using default k-mer sizes. For the assembly using NOVOPlasty, the reference mitogenome of the closest species in GenBank was used as the seed reference. Subsequently, quality was evaluated by utilizing the PALEOMIX pipeline. Gene identification and tRNA structure prediction were performed with the Mitochondrial Genome Database of Fish (http://mitofish.aori.u-tokyo.ac.jp/). The complete mitogenomes were then annotated, and a circular mitogenome map was generated by using MitoAnnotator [8]. Additionally, the nucleotide composition of the mitogenome was determined by applying MEGA v. 7.0 [9]. Nucleotide compositional differences were determined by using the formulae AT skew = (A − T)/(A + T) and GC skew = (G − C)/ (G + C) [10], where each nucleic base letter represents the count of a specific base.
Phylogenetic analysis
A phylogenetic tree of Paedocypris mitogenomes, including the sequences retrieved from GenBank, was constructed by using MEGA v. 7.0 software [10], which contains advanced methods and tools for phylogenomics and an optimized 64-bit computing system for the analysis of a large dataset. The neighbor-joining method [11] was utilized for the comparative analyses of the nucleotide sequences of P. progenetica, comprising 13 protein-coding genes (PCGs), and those of putative Paedocypris species, including six sequences from the GenBank database. The 13 PCGs (without stop codons) were aligned with the vertebrate mitochondrial genetic code by using the MASCE [12] algorithm in PhyloSuite 1.2.2 [13]. The alignments of each individual gene were concatenated as different datasets with six mitogenomes retrieved from the GenBank entry. The multiple alignments of the concatenated nucleotide sequences of the 13 PCGs were conducted by using the MEGA v. 7.0 program [14]. The bootstrap confidence of 1,000 replicates was applied to evaluate the resulting phylogenetic tree. The trees were rooted by using the GenBank entry of the closely related family of Schismatorhynchos nukta (Cyprinidae) as an outgroup.
Results and Discussion
The complete mitogenome sequences of S3-P. progenetica (OK413207) and P1-P. progenetica (OK356905) were 16,874 and 16,616 bp, respectively, as shown in Table 1. The two recorded mitogenomes were parallel to the Serranochromis robustus and Buccochromis nototaenia fish mitogenomes and comprised of 13 PCGs, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (Table 2) that were clearly within the range [15]. According to Sun et al. [14] and Mullens et al. [16], a set of 13 PCGs and two rRNAs in the mitochondrial gene are consistently used as markers to strengthen the identification or resolve high-level relationships between fish species. The stipulated data in Table 2 indicated that the ND6 and seven tRNA genes (tRNAile, tRNAAla, tRNAAsn, tRNACys, tRNASer, tRNAGlu, and tRNAPro) were encoded on the L-strand, whereby, most of P. progenetica mitochondrial genes were encoded on the H-strand. This finding was consistent with the result reported by Sam et al. [2] on the mitogenomes of P. micromegethes and P. carbunculus as there were no significant changes found between the populations of P. progenetica between the conserved genes (PCGs, tRNAs and rRNAs). However, the mitogenome length of P. micromegethes and P. carbunculus were clearly different from those of P. progenetica, presumably because of the variations in the control region (D-loop).
Phylogenetic relationship
The mitogenomic phylogeny analysis clustered the two mitogenomes of P. progenetica (OK356905 and OK413207) with the mitogenome of P. progenetica from Indonesia (AP011287) [17] and rooted them with the mitogenomes of other Paedocypris species [2,18] (Fig. 1) with the high support of 100% bootstrap and 1.00 posterior probability. GenBank revealed that the closest match (>96% similarity) was between the newly sequenced mitogenomes of P. progenetica from Peninsular Malaysia and the mitogenomes of P. progenetica (AP011287) from a peat swamp in Sumatera, Indonesia [18] (Table 1). Moreover, <96% similarity was found among Paedocypris species. The molecular evidence strongly indicated that Clade 1, which included the P. micromegethes (NC_051487.1) subclade-1, comprised a stable monophyletic group. The latest research has found identical ancestral patterns for Cirrhirnus reba, which aligned in the same clade containing the same species [19].
Protein-coding genes
The prominent features of Paedocypris mitochondrial genes are listed in Table 3, which indicates that all PCGs, except for the COI gene that began with GTG, began with the start codon (ATG). The seven PCGs including nad1, cox1, atp8, atp6, nad4l, nad4, and nad5 were terminated by a complete and canonical stop codon (TAA or TAG). However, the genes encoding cox2, cox3, nad2, and nad3 had a truncated stop codon. Similar to the finding reported by Sam et al. [2], except for the COI gene that was terminated by GTG, most PCGs in the mitogenomes of P. carbunculus and P. micromegethes were terminated by the codon TAR (TAA/TAG) or an incomplete codon (TA-/T--). According to Zhong et al. [20], a truncated stop codon (T) is commonly found in the mitochondrial gene of metazoans, such as the spider Habronattus oregonensis, and does not affect mitochondrial gene transcription or translation because the complete stop codon is presumably obtained through post-transcriptional polyadenylation [21].
Gene arrangements
The remarkable species P. progenetica of both samples (S3 and P1) from Peninsular Malaysia were aligned with 96% similarity of P. progenetica collected from Indonesia (AP011287) retrieved in GenBank entry. The overall nucleotide composition of P. progenetica was 33.0% A, 27.2% T, 23.5% C, and 15.5% G and showed a slightly AT-rich region (60.25%); these results were consistent with the patterns found in most fish mitogenomes [22]. The nucleotide composition of the P. progenetica mitogenome was highly biased toward A + T and had similar values as other Paedocypris species, such as P. progenetica from Banka. The PCGs had a slightly higher A + T content (61.7%) compared to ribosomal RNA genes (59.8%). Based on Table 3, the AT and GC skew of P. progenetica showed 0.10 and −0.20, respectively. The GC skews of all genes, except for those of NAD6 and tRNA, which were positive for both populations, were negative and indicated a regular pattern of base composition behavior in the P. progenetica mitogenome. This result agreed well with that of Sam et al. [2], who discovered the AT-skew was mainly positive and the GC-skew were mostly negative values in distinct gene regions of the P. micromegethes and P. carbunculus mitogenomes.
Meanwhile, a vast difference in nucleotide composition in the control region (D-loop region) located between trnP (tRNAPhe) and trnF (tRNAPro) can be seen in this genus. The lengths of PCGs, tRNAs, and rRNAs were conserved, and the variations were mainly attributed to the control region. The lengths of the D-loop region in P. micromegethes and P. carbunculus were 1,590 and 1,662 bp, respectively, whereas we found the considerably shorter D-loop region length of 1,209 bp (OK356905 and OK413207). These results differed because the D-loop region exhibits a rapid evolutionary rate and tends to possess lower purifying selection compared to PCGs that amass variations in length [23]. According to Li et al. [24], noncoding regions in metazoan mitogenomes frequently vary in length from species to species. However, the D-loop region of the reference species collected from Indonesia (AP011287) was not recorded in GenBank. Eventually, future studies on the noncoding region may contribute genetic data and enhance studies on the genomic data of P. progenetica.
Conclusion
The full mitogenome sequence of P. progenetica was analyzed and compared with that of other Paedocypris species mainly focused on P. micromgethes and P. carbunculus in the Cyprinidae family. The mitogenome length of P. progenetica was shorter compared to other Paedocypris species predominantly due to variations in the D-loop region. The comparison of the complete mitogenome data and phylogenetic relationships of Paedocypris species provided fundamental information for evolutionary biology and are particularly important for future studies using the D-loop region and whole-genome sequences to resolve the relationship among Paedocypris species fully.
Notes
Authors’ Contribution
Conceptualization: IAA, YE, AA. Formal analysis: FMS, HMG, KM. Methodology: PNSJ, HMG. Writing - original draft: NH, PNSJ. Writing - review & editing: IAA, FMS.
Conflicts of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This study was fully funded by the Ministry of Higher Education, Malaysia, under the Fundamental Research Grant Scheme (Ref. No: RACER/1/2019/STG05/UITM//5) through Universiti Teknologi Mara (UiTM) (Ref No: 600-IRMI/FRGS-RACER 5/3 (058/2019). The authors would like to thank everyone from UPM, UiTM, UMT, Selangor Forestry Department, and local communities who were involved in sample collection. The authors would also like to thank the Fish Genetics and Breeding Laboratory, Department of Aquaculture, Faculty of Agriculture, Universiti Putra Malaysia (UPM) for providing facilities and chemicals for molecular work as well as University Technology Malaysia (UTM) aided in conducting the analysis for mitogenome. Lastly, the first author would like to thank the Jabatan Perkhidmatan Awam (JPA) for providing a scholarship for her postgraduate study.