Introduction
Viruses mutate faster than other microorganisms, and such mutations often lead to malignant infections in humans, animals, and plants. Therefore, it can be useful to develop methods to rapidly identify mutant viruses on the basis of International Committee on Taxonomy of Viruses (ICTV) taxonomy [
1]. The function of a protein depends on its tertiary structure and alterations in protein tertiary structure leads to changes in protein function. Protein tertiary structure is determined by protein primary structure, which is comprised of the combination of amino acids. Therefore, from the genetic point of view, alterations in protein tertiary structure imply changes of the protein sequence in the coding region in the exons of the genome sequence. Thus, genetic mutations can alter the function and structure of protein and lead to disease. Therefore, in order to quickly detect the similarity of function at the emergence of new viruses, the final purpose is to analyze conserved domains, which can identify specific protein sequences for each virus. As the first application of this approach, we focused on
norovirus, a positive single-strained RNA virus in
Caliciviridae [
2]. We extracted coding region (CDS) sequences in viral RefSeq GenBank and then apply the CDS protein sequences to the conserved domain database (CDD) (
Table 1) [
3]. Thereby, we assigned the meaningful annotation and selected specific protein sequences from domain tables generated by executing RPS-BLAST with the query of complete genome for each virus in
Caliciviridae.
Results and Discussion
In
Caliciviridae, all genera have RNA helicase in common as
Fig. 3. In addition, RNA helicase exists in the type of single-strain RNA in common. As we know, RNA helicases have important roles in viral life cycles as well as in all cellular processes involving RNA. Vertebrate RNA helicases detects viral infections and causes the innate antiviral immune response. RNA helicases have been related with protozoic, bacterial and fungal infections such as neurological disorders, cancer, and aging processes. Thus, they can be utilized such as markers diagnosing symptoms and drug targets as well as antiviral and anti-cancer treatment [
9]. In case of
Norovirus, Calicivirus coat protein C terminal (Calici_coat_C) and viral polyprotein N-terminal (Calici_PP_N) appears unlikely as a specific domain in
Caliciviridae, whereas the domain of Calicivirus minor structural protein (Calici_MSP) can be used as the specific proteins of
Sapovirus. In here, Calici_PP_N is found at the N-terminus of non-structural viral polyproteins of the Caliciviridae subfamily. However,
Vesivirus can be detected by protein of unknown function of DUF743 in CDD. In Caliciviridae, all viruses have Calicivirus coat protein (Calici_coat) domain in common.
Lagovirus and
Nebovirus contain
Lagovirus protein of unknown function (DUF840), respectively.
Lagovirus has only two species including rabbit hemorrhagic disease virus.
In case of
Picornaviridae that is the same type with
Caliciviridae, RNA helicase is also conserved in all genera in
Picornaviridae. Specially, in case of
Enterovirus, picornavirus coat protein (VP4) (Pico_P1A), picornavirus core protein 2A (Pico_P2A), which is a protease triggering polyprotein decomposition, and picornavirus 2B protein (Pico_P2B), which enhance membrane permeability, can be considered with the specific domains as compared with the other viruses in
Picornaviridae. However, the E-value of
Rabovirus is higher than that of
Entertovirus even if the domains of
Enterovirus seems to have a similar pattern with that of
Rabovirus.
Hepatovirus has Hepatitis A virus viral protein VP (HAV_VP) as the specific domain. This protein is found in hepatitis A viruses and targeted to the liver.
Cardiovirus has unlikely the subunit of viral protein VP4 (VP4_2) (
Fig. 4).
Overall, in terms of the methodology searching the specific protein sequences in order to identify the virus, the viruses could be classified by RPS-BLAST search with CDD and CDS sequence queries, which are parsed from viral RefSeq GenBank data. The viruses of
Caliciviridae and
Picornaviridae of single-strained RNA type are conserved to RNA helicase in common, which plays important role for viral infections detection and grasping the innate antiviral immune response. Calici_coat_C, Calici_MSP, and protein of unknown function (DUF743) could be considered with the specific protein sequence of
Norvirus,
Sapovirus, and
Vesivirus in
Caliciviridae, respectively. In addition, in the case of
Picornaviridae, which is the same type with
Caliciviridae, Pico_P1A, Pico_P2A, and Pico_P2B could regard as the specific protein sequence of
Enterovirus. Therefore, suppose the method of
Fig. 1 is applied to all viruses, the specific protein domain of each virus could be determined or compared by conserved domain searching with RPS-BLAST. It would provide useful clues for searching the specific protein sequences. If the specific protein sequences are defined, it could be converted to gene sequences. It would be utilized usefully to find viral bio-marks based on functional structure information of protein domain as well as used as classification keyword.