§These authors contributed equally to this work.
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
High-throughput mRNA sequencing technology has developed at great pace in recent years [
Evaluating differential expression in conditions by RNA-Seq is a multi-step process [
This study proposes TRAPR (Total RNA-Seq Analysis Package for R,
TRAPR provides two functions to import RNA-Seq experimental data and four functions to export results to files. TRAPR can read text files for expression data as well as for a list of genes. During or following analysis, users can export DEG lists or detailed tables for DEG and expression tables, which other tools can utilize.
TRAPR provides three types of data preprocessing methods: filtering, transformation, and normalization. TRAPR filtering has six filter types: sample, gene, zero value, low variance, low expression, and gene list. Unlike DNA microarrays that have a fixed number of probes, RNA-Seq explores massive amounts of isoforms and novel transcripts mixed with noise, such that it returns many zeros and nonsense values. Genes encoding miRNAs or snoRNAs often show extremely high expression levels, even though they are treated by a poly-A purification procedure. These outliers can easily be removed by zero-value and gene filters. Statistical power can be improved by low-expression and low-variance filters by reducing non-standard distributions. Analyzing different combinations of samples can conveniently be supported by sample filtering.
TRAPR provides two well-known transformation functions, log2 transformation and VSN, followed by hyperbolic arcsin, arcsin(x), and transformation [
TRAPR provides many normalization methods, including upper quantile [
TRAPR has several statistical testing functions to identify DEGs. Student t-test and statistical methods suggested in edgeR, baySeq, and DESeq assume a normal distribution or a Poisson distribution. Meanwhile, methods in DEGseq and NOISeq [
Data preprocessing steps are not supported by visualization functions in previously developed packages, while proper visualization is essential and powerful for evaluating the quality of the RNA-Seq data and the preprocessing steps. TRAPR provides five flexible plotting functions, including density, boxplot, MA, scatter, and mean–variance plots. Volcano plots and heatmaps are also provided to visualize the results of statistical analysis. Each visualization function has direct access to FPKM values and differential expression values.
We have developed TRAPR, an R package for RNA-Seq data analysis. TRAPR provides an entire pipeline for RNA-Seq analysis, which is not merely a combination of currently available tools, but the backbone that facilitates the proper application and coordination of these tools. For instance, upper-quartile normalization followed by zero-value filtering, VSN, and edgeR statistical testing with proper data visualization can easily be streamlined through TRAPR. These combinations will help improve accuracy and statistical power. TRAPR provides visualization tools and file I/O functions to evaluate the quality and characteristics of the data. TRAPR was developed and integrated in R, such that it can be easily applied to other technologies like Serial Analysis of Gene Expression and microarray. Various filters have been integrated into the package. TRAPR can be used as a platform to interweave RNA-Seq data analysis tools and packages to take advantage of the virtues of each.
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (2012-0000994), the Korean Health Technology R&D Project, the Ministry of Health and Welfare (HI13C2164), and the ICT R&D program of MSIP/IITP [B0101-15-247 for the “Development of Open ICT Healing Platform Using Personal Health Data”], a grant (16183-MFDS541) from Ministry of Food and Drug Safety in 2016.