In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate.
During recent years, hundreds of bacterial genomes are available, while their annotation is of interest [
Improving the functional annotation is of great importance for many follow up studies and we here apply computational tools for function prediction for one of the most devastating human pathogens
Six randomly selected HPs which contain standard number of amino acids sequences of
By using the Expasy's Protparam server (
Pfam (
CD-Search (
STRING (
PSORTB (
DISULFIND (
(PS)2 (pronounced PS square) was used for the prediction of the tertiary structures of HPs (
Q-SiteFinder (
We analyzed the physiochemical properties of these HPs of cholera for the first time. In
At 280 nm, the extinction co-efficient of HPs ranges from 23295 to 62005 M cm computed by Expasy's Protparam instead of 276, 278, 279, and 282 nm. The presence of high concentration of Cys, Trp, and Tyr indicates a higher extinction coefficient of HPs. The quantitative study of protein-protein and protein-ligand interactions in solution can be done by using this computed extinction coefficients. The instability index value of the HP was found to be ranging from 30.44 to 50.35. It is predicted that a protein will be stable whose instability index is smaller than 40, a value above 40 predicts that the protein will be unstable [
Another parameter of structure identification of protein is instability index. Proteins, gi|163644906, gi|163644912, and gi|163644916 were stable and others were unstable. The instability index indicates an approximate stability of proteins in a test tube.
The AI is the relative volume of a protein occupied by aliphatic side chains (A, V, I, and L) and is considered as a positive factor for the raise of thermal stability of globular proteins. The range of the AI for the HPs is from 64.14 to 82.92. The proteins with very high AI may show stability in a wide temperature range where lower AI proteins are not thermal stable and show more flexibility.
The GRAVY of HPs is ranging from -0.304 to -0.633. The better interaction of protein and water is occurring in low GRAVY. The GRAVY value for a protein is calculated by adding the values of hydropathy of all the amino acids and dividing it by the number of residues in the sequence [
To study the functional analysis conserved domains were observed because conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The data are then used for putative functional annotation of protein query sequences based on matches to specific super-families history, identification of proteins with similar domain. The proteins have been classified into particular families based on the presence of specific domains in the sequence [
Domains and families present in HPs were identified by the Pfam database research (
To explain the protein functions involved in various cellular processes it is important to know the sub-cellular localization of that protein. During the drug discovery process knowledge of the sub-cellular localization of a protein play a very significant role in target identification.
In our study, we have found two proteins gi|163644906, gi|184468567 are cytoplasmic as their best performing sites. The remaining other protein localization was not found. The server SOSUI differentiates whether the HPs are membranous or soluble. No trans-membrane protein was found and all were soluble.
Moreover, DISULPHIND server revealed no disulphide bonds were present in any of those proteins which indicate that they were thermally unstable. Moreover, disulfide bridges play a major role in stabilizing the folding process of many proteins. Disulfide bridges are very important finding in the study of structural and functional properties of specific proteins [
For performing almost all the cellular functions the PPI are important. Proteins often interact with one another in a mutually dependent way to perform a common function. It is notable that translational factors interact among themselves to carry out the whole translation. The function of protein is predictable from this based on their interaction with other proteins. It is very rare that proteins bring out function with any interactions with other biomolecules. For this reason, in this post genomic era PPI databases have turned as a most important resource for searching biological networks and pathways in cells [
PS square server (
We have retrieved 6 HPs from NCBI database and determined their physicochemical properties and identified domains and families using various Bioinformatics tools and databases. The three dimensional structure of those HPs were modeled (only 2) and their ligand binding sites were identified. Among them we have found domains and families of only one HP, analysis showed that the domains and families are involved in DNA breaking-rejoining activities, integrase activity. All of these features from our findings may be used to design new potential drugs against this infectious bacterium.
Sequence ID | No. of AAs | MW | pI | (+)R | (–)R | EC | II | AI | GRAVY |
---|---|---|---|---|---|---|---|---|---|
gi|84095108 | 161 | 18,290.0 | 8.30 | 19 | 16 | 38,555 | 49.29 | 66.65 | –0.429 |
gi|163644906 | 284 | 31,693.3 | 8.78 | 45 | 40 | 23,295 | 30.44 | 82.92 | –0.385 |
gi|163644912 | 209 | 23,395.4 | 4.62 | 18 | 35 | 39,670 | 35.51 | 81.20 | –0.304 |
gi|163644916 | 210 | 23,085.9 | 7.65 | 26 | 25 | 32,680 | 31.51 | 64.14 | –0.633 |
gi|84468567 | 183 | 21,248.2 | 9.20 | 26 | 21 | 30,035 | 50.35 | 80.98 | –0.591 |
gi|84468557 | 208 | 24,368.6 | 9.78 | 29 | 17 | 62,005 | 46.71 | 67.50 | –0.629 |
AA, amino acid; MW, molecular weight; pI, isoelectric point; (+)R, total number of positively charged residues (Arg + Lys); (-)R, total number of negatively charged residues (Asp + Glu); EC, extinction coefficient; II, instability index; AI, aliphatic index; GRAVY, grand average hydropathy.
Sequence ID | Domains |
---|---|
gi|84468567 | DNA_BRE_C superfamily, Topoisomer_IB_N, DUF3946 |
Superfamily | Description |
---|---|
DNA_BRE_C | DNA breaking-rejoining enzymes, C-terminal catalytic domain. The DNA breaking-rejoining enzyme superfamily includes type IB topoisomerases and tyrosine recombinases that share the same fold in their catalytic domain containing six conserved active site residues. The best-studied members of this diverse superfamily include human topoisomerase I, the bacteriophage lambda integrase, the bacteriophage P1 Cre recombinase, the yeast Flp recombinase and the bacterial XerD/C recombinases. |
DUF3946 | Protein of unknown function (DUF3946); a family of uncharacterized proteins found by clustering human gut metagenomic sequences. This family appears related to the N-terminal domain of phage integrases. |
Topoisomer_IB_N | Topoisomer_IB_N: N-terminal DNA binding fragment found in eukaryotic DNA topoisomerase (topo) IB proteins similar to the monomeric yeast and human topo I and heterodimerictopo I from |
Sequence ID | Pfam-A | Pfam-B | Domains |
---|---|---|---|
gi|84095108 | PhnA Zn ribbon | Pfam-B_18384 | - |
gi|163644906 | - | - | - |
gi|163644912 | - | - | - |
gi|163644916 | LPAM 1 | Pfam-B_4989 | - |
gi|84468567 | Phage integrase | Endonuc-PvuII | - |
gi|84468557 | Rev | Pfam-B_12598 | Integrase core |
Sequence ID | Description |
---|---|
gi|84095108 | PhnA Zn ribbon |
gi|163644916 | Prokaryotic membrane lipoprotein lipid attachment site |
gi|84468567 | Phage integrase family |
gi|84468557 | Integrase core domain |
Sequence ID | Interacting protein |
---|---|
gi|163644906 | Signal peptide peptidase SppA domain-containing protein |
DSBA-like thioredoxin domain-containing protein | |
gi|163644912 | Signal peptide peptidase SppA domain-containing protein |
DSBA-like thioredoxin domain-containing protein | |
gi|163644916 | Type IV conjugative transfer system protein TraD |
Type IV conjugative transfer system protein TraI | |
Putative type IV conjugative transfer system coupling factor | |
gi|84468567 | Ribosomal-protein-alanine acetyltransferase |
Recombination factor protein RarA | |
ATP-dependent RNA helicase HrpA | |
Zinc-binding domain-containing protein | |
Putative ATP-dependent helicase | |
Dihydroxy-acid dehydratase | |
gi|84468557 | ISVch4 transposase |
Sequence ID | Template |
---|---|
gi|84468567 | 2a3vB |
gi|84468557 | 1bcoA |
Sequence ID | Site volume | Residue |
---|---|---|
gi|84468567 | 499 | MET 1, GLU 2, CYS 3, ARG 5, LEU 6, ARG 7 ,GLN 9,ASP 10, ARG 19, ILE 20, TRP 21, GLN 22, GLY 23, LYS 24, GLY 26, LYS 27, TRP 65, LEU 66, PRO 67, LEU 70, TRP 83, TYR 85 |
gi|84468557 | 493 | GLY 45, ASP 46, VAL 47, ALA 60, VAL 61, VAL 62, SER 81, LEU 83, THR 84, GLY 85, ALA 87, LEU 88,SER 89, PHE 103, HIS 104, SER 105, GLN 107, THR 112, LYS 115, TYR 116, ILE 125, LYS 126, SER 128, LEU 129, ARG 132, TRP 136, ASP 137, ASN 138 |