Genomics Inform Search


Genomics Inform > Volume 15(2); 2017 > Article
Javadi, Oloomi, and Bouzari: In Silico Signature Prediction Modeling in Cytolethal Distending Toxin-Producing Escherichia coli Strains


In this study, cytolethal distending toxin (CDT) producer isolates genome were compared with genome of pathogenic and commensal Escherichia coli strains. Conserved genomic signatures among different types of CDT producer E. coli strains were assessed. It was shown that they could be used as biomarkers for research purposes and clinical diagnosis by polymerase chain reaction, or in vaccine development. cdt genes and several other genetic biomarkers were identified as signature sequences in CDT producer strains. The identified signatures include several individual phage proteins (holins, nucleases, and terminases, and transferases) and multiple members of different protein families (the lambda family, phage-integrase family, phage-tail tape protein family, putative membrane proteins, regulatory proteins, restriction-modification system proteins, tail fiber-assembly proteins, base plate-assembly proteins, and other prophage tail-related proteins). In this study, a sporadic phylogenic pattern was demonstrated in the CDT-producing strains. In conclusion, conserved signature proteins in a wide range of pathogenic bacterial strains can potentially be used in modern vaccine-design strategies.


The co-evolution of pathogenic bacteria and their hosts leads to the generation of functional pathogen-host interfaces. Well-adapted pathogens have evolved a variety of strategies for manipulating host cell functions to guarantee their successive colonization and survival. For instance, a group of gram-negative bacterial pathogens produces a toxin, known as cytolethal distending toxin (CDT) [1]. Among the vast majority of CDT producers are Escherichia coli, which is commonly found in the intestines of humans and other mammals. Most E. coli strains are harmless commensals; however, some isolates can cause severe diseases and are designated as pathogenic E. coli. Among the various pathogenic E. coli strains, some have acquired virulence determinants through the horizontal transfer of genes, such as the cdt genes encoding CDTs. CDTs were the first bacterial toxins identified that block the eukaryotic cell cycle and suppress cell proliferation, eventually resulting in cell death. The active subunits of CDT toxins exhibit features of type I deoxyribonuclease-like activity [2,3].
In this study, comparative genome analysis of CDT-producer E. coli isolates with other pathogenic and commensal strains was performed. Alignments between multiple genomes led to the identification of a set of distinct (“signature”) sequence motifs. These signature sequences could be used to delineate single genomes or a specified group of associated genomes within a desired group, such as the CDT-producing E. coli (the target group in this study). While genomic signatures were conserved in the target group, which they were not conserved or were absent in other related or unrelated genomes (i.e., the background group). From a clinical point of view, conserved signature sequences could offer advantages in predicting and further designing novel CDT inhibitors to vaccine candidates [4].
On the other hand, phylogenic trees can be constructed based on multiple sequence alignments. It is important that phylogeny based on an immense number of genes and whole-genome sequences are more reliable than those based on a single gene or a few selected loci [5]. Phylogenic analysis can provide an overall classification of the target group among the background group. Alignment of whole-genome sequences yields detailed information on specific differences between genomes and, consequently, has shed new insights into phylogenetic relationships in recent years [6,7,8,9].
In this study, phylogenic relationships of CDT+ strains with other pathogenic and commensal E. coli strains were assessed, and conserved signature genomic regions in the target group (CDT-producers) were annotated. This information could be used for developing molecular diagnostics assays, polymerase chain reaction primer and probe design in modern vaccines.


CDT+ strains

Several databases were used to identify bacterial strains harboring cdt genes. Data was extracted from the following resources: NCBI, National Center for Biotechnology Information GenBank; EMBL, European Molecular Biology Laboratory; DDBJ, DNA Data Bank of Japan; PDB, Protein Data Bank; RefSeq, NCBI Reference Sequence Database; and UniProtKB, Swiss-Prot Database.

Whole-genome sequences

All genomes analyzed in this study were downloaded from the NCBI file transfer protocol (FTP) site at: ftp://ftp.

Reordering of draft genomes

Ordering and orienting contigs in draft genomes facilitates comparative genome analysis. Contig ordering can be predicted by comparison of a reference genome that is expected to have a conserved genome organization [10]. ProgressiveMauve (version 2.3.1) was used for ordering contigs in draft genomes. Mauve contig mover (MCM) offers advantages over methods that rely on matches in limited regions near the ends of contigs [11,12]. The E. coli K-12 MG1655 strain (accession No. NC_000913.3) was used as a reference genome.
The MCM optional parameters were used in this study including default seed weight, use seed families: 15 determine Locally Collinear Blocks (LCBs); LCBs, full alignment, iterative refinement, sum-of-pairs LCB scoring, and min LCB weight: 200.

Multiple genome alignments

In this study, Gegenees software (version 2.2.1) was used for multiple-genome alignments. The software is written in JAVA, and making it compatible with several platforms. Limitations were not observed in the speed calculation, number and memory of the genomes that could be aligned. Gegenees software is also capable of performing fragmented alignments [4]. Multiple alignments of E. coli genomes were created using a fragment size of 200 nucleotides, a step size of 100 parameters, and BLASTN, which was optimized for highly similar sequences.

Phylogenic tree construction

A phylogram was produced in SplitsTree 4, using the neighbor-joining method and a distance matrix Nexus file exported from Gegenees software [13]. E. albertii TW07627 and E. fergusonii ATCC 35469 strains were set as the out-groups.

Identifying conserved signatures

CDT-producing isolates were set as the target group, and all other strains were used as the background group by using the in-group setting tab in Gegenees software. Because of the genomic diversity in CDT-producer E. coli, we repeated this procedure with five different strains, including E. coli 53638, E. coli IHE3034, E. coli RN587/1, E. coli STEC B2F1, and E. coli STEC C165-02, which were defined as separate reference strains.
The biomarker score (max/average) setting was also used. Biomarker scores were drawn graphically and loaded into the tabular view for further data analysis. In the tabular view, a score of 1.0 is the maximum biomarker score and is considered as a signature.

Assembling signature fragments

Several overlapping fragments were obtained, based on the sequences of each reference strain. To facilitate subsequent analysis steps, the overlapping fragments were assembled using DNA Dragon software, version 1.6.0 (
The settings were designed with minimum overlaps (100 bases) along the diagonal length, a minimum %-identity of complete overlapping fragments, and 100% full-search parameters.


BLAST was done with sequences for each of the five reference strains by using NCBI BLASTX ( to identify the putative protein domains. Furthermore, putative conserved domains were also detected. The results were confirmed using the Uni-ProtKB Bank BLASTX program (



The sequences of 76 strains were downloaded from the NCBI site. Details regarding genome sizes, %GC content, the number of encoded proteins, encoded genes, genome type, pathotype, serotype, other characteristics, and accession numbers are summarized in Table 1. Most data presented were extracted from NCBI GenBank and UniProt Bank and some information was extracted from original articles [14,15]. The genomes of 24 strains were drafted, and a reordering process of the draft genomes was performed. Twenty-five CDT+ E. coli strains were analyzed, including E. albertii TW07627.

Phylogenic analysis

A heat-plot based on a 200/100 BLASTN fragmented alignment drawn with Gegenees software is shown in Fig 1. A phylogenic overview is also shown in the heat-plot. A more detailed phylogram was constructed with SplitsTree 4 software, as shown in Fig. 2.
CDT-producer E. coli strains were displayed a sporadic, phylogenomic pattern in the heat-plot, with a lack of a consensus pattern. Six distinct genomic groups of CDT+ strains (T1 to T6 in Fig. 2) were shown in the phylogram, all of which were sporadic among the strains in Fig 1. As a sporadic pattern of CDT-producing strains was observed in the bacterial population in the phylogram for specific clades, these strains were related and some degrees of similarity were also found.

Signature sequences in the target group

In total, 1,527 fragments representing 3.0% of the E. coli 53638-strain genome were identified as signature sequences. Biomarkers were restricted to 21 highly significant regions, designated A to U. When E. coli IHE3034 was set as the reference strain, 220 signature sequences (0.4%) were detected. Biomarkers were identified in six regions, designated A to F. However, 1,512 (2.9%) signature fragments were obtained, which were restricted to 18 regions (A to R) in the genome of E. coli RN587/1 when it was regarded as the reference strain. Moreover, 620 biomarker fragments (1.2%) were detected in the genome of E. coli STEC B2F1 when it was set as the reference strain, 16 biomarker regions (A to P) were recognized. In addition, when E. coli STEC C165-02 was used as the reference strain, 593 signature fragments (1.1%) were identified, which were restricted to eight regions (A to H). The signature regions for all reference strains are shown in Fig. 3, separately. In addition, the biomarker designation, domain description, BLASTX results and related putative conserved domains for each reference strain are provided in Supplementary Tables 1, 2, 3, 4, 5, 6.

Conserved signature proteins

The most common biomarker proteins were distinguished by comparing BLASTX results for all reference strains fragments (Table 2). The signature proteins identified included: CDT, holin, lambda-family proteins, nuclease, phage integrase family proteins, phage tail tape measure family proteins, putative membrane proteins, regulatory proteins, restriction-modification system proteins, tail fiber assembly proteins, baseplate assembly proteins, tail fiber protein and other prophage tail related proteins, terminuses and transferases. The nucleotide sequences of some proteins including anti-termination proteins, prophage DNA packaging and binding proteins, transposase and DNA transposition proteins, scaffold proteins, recombination-related domains, putative phage-replication proteins, hemolysin, helicase, glycol transferase, and glycohydrolase superfamilies, were detected as biomarkers in the target group, although these BLASTX results were not observed in all reference strains. Presumably, CDT-producer E. coli strains possess several hypothetical proteins whose functions are not yet defined and might be conserved proteins. The existence of these DNA biomarker sequences in reference strains is clear; however, the related proteins in some strains have not been determined.

Significant putative conserved domains and superfamilies

In the era of modern vaccines, finding conserved domains or epitopes has a great therapeutic value. Putative conserved domains were described as non-specific hits (NH), specific hits (SH), and multi-domains (MD), and it was shown in Supplementary Tables 1, 2, 3, 4, 5, 6.
The putative conserved domains and superfamilies that were associated with some signature proteins are shown below.
  • - NH: PRK15251, DUF4102, CdtB, CDtoxinA, INT_P4, HP1_INT_C, Phage_integrase, INT_Lambda_C, Phage_integ_N, Methylase_S, Caudo_TAP, phage_tail_N, Tail_P2_I, gpI, phage_term_2, Terminase_3, Terminase_5, M, Phage_term_smal, COG5525, Terminase_GpA, Phage_Nu1, dexA, Phage_holin_2, DUF3751, Phage_attach, dcm, DNA_methylase, Cyt_C5_DNA_methylase, Dcm, Glycos_transf_2, and CESA_like

  • - SH: INT_REC_C, PhageMin_Tail, COG4220, Phage_fiber_2, HSDR_N, Glycos_transf_2, GT_2_like_d, PRK-10018, and PLN02726

  • - MD: PRK09692, int, recomb_XerC, XerD, xerC, HsdS, N6_Mtase, HsdM, hsdM, rumA, P, Terminase_6, COG-5484, PLN03114, COG5301, COG0610, hsdR, PRK-10458, PRK10073, Glyco_tranf_2_3, WcaA, PRK10073, and PTZ00260

  • - Superfamilies: RICIN superfamily, EEP superfamily, DNA_BRE_C superfamily, DUF4102 superfamily, Phage_integ_N superfamily, MCP_signal superfamily, Methylase_Ssuperfamily, Caudo_TAPsuperfamily, phage_tail_Nsuperfamily, Tail_P2_Isuperfamily, Terminase_3superfamily, Terminase_5superfamily, Phage_term_smalsuperfamily, Terminase_GpAsuperfamily, Phage_Nu1superfamily, DnaQ-like-exosuperfamily, Phage_holin_2superfamily, DUF3751 superfamily, Phage_fiber_2superfamily, Gifsy-2 superfamily, HSDR_Nsuperfamily, Cyt_C5_DNA_methylase superfamily, MethyltransfD12superfamily, Glyco_transf_GTA type superfamily, and Glyco_transf_GTA typesuperfamily


The synchronic evolution of bacterial pathogens and virulence-associated determinants encoded by horizontally transferred genetic elements has been observed in several species. However, E. coli is a normal member of the intestinal microflora of humans and animals. E. coli strains have acquired virulence factors by the attainment of particular genetic loci through horizontal gene transfer, transposons, or phages. These elements frequently encode multiple factors that enable bacteria to colonize the host and initiate disease development [16]. CDTs belong to one such class of virulence-associated factors. CDT was first identified in E. coli by Johnson and Lior in 1988 [17]; since then several studies have been reported that CDTs can be produced by intestinal and extra-intestinal pathogenic bacteria [18].
In this study, the genomes of 25 CDT+ E. coli strains were acquired from several gene banks. Multiple genome comparisons with 49 CDT E. coli strains, including EPEC (enteropathogenic E. coli), ETEC (enterotoxigenic E. coli), STEC (Shiga toxin-producing E. coli), EAEC (enteroaggregative E. coli), EIEC (enteroinvasive E. coli), AIEC (adherent invasive E. coli), UPEC (uropathogenic E. coli), ExPEC (extraintestinal pathogenic E. coli), EHEC (enterohemorrhagic E. coli), environmental strains and commensal strains were performed.
In fact, phylogenic analysis based on whole-genome information is more accurate than those based on one gene or a set of limited genes. In this study, CDT-producing strains were not shown a phylogenomic relationship or pattern. Indeed, while they might carry the same or similar virulence gene sets, they also possess their own divergent genomic structures. This is probably because of their complex and distinct evolutionary pathways, indicating an independent acquisition of mobile genetic elements during their evolution.
The sporadic pattern in the phylogenomic dendrogram confirmed previous findings that CDT+ strains are heterogeneous. The heterogeneous nature of CDT-producing strains might arise from horizontal gene transfer through mobile genetic elements. These genetic exchanges that occur in bacteria provide genetic diversity and versatility [19].
A significant challenge in comparative genomics is the utilization of large datasets to identify specific sequence signatures that are biologically important or are useful in diagnosis [4,20]. In this study, we define CDT-producing E. coli as the target group and found regions that were conserved that could serve as genomic signatures for the target group. Because of the heterogeneous genomic nature of CDT+ E. coli, five reference strains were selected instead of one, including EIEC, ExPEC, EPEC, STEC B2F1, and STEC C165-02. Moreover, in the phylogenomic overview, these five reference strains were selected from different clades of the phylogenic tree, representing the T1–T6 groups.
The findings was presented in this study indicate that the major conserved biomarkers beyond CDT were exonuclease, phage integrase, putative membrane, and tail-fiber proteins. Furthermore, with signature proteins of a targeted group, it was shown that phage-related proteins and virulence-associated factors could be commonly transferred by phages. Moreover, in the putative conserved domains of biomarker proteins, phage-related superfamilies were frequently observed. As a result, cdt genes were used as a signature sequences in CDT-producing E. coli strains, and it was shown that they can be used as a powerful biomarker.
In this study, the most significant signature proteins in the five E. coli strains were identified using in-silico whole-genome sequences. It was demonstrated that conserved signature proteins were expressed in a wide range of pathogenic bacterial strains, which could be used in future studies in a broad range of research applications and in modern vaccine-design strategies.


This work was supported financially by the Pasteur Institute of Iran. We would like to thank Editage ( for English language editing.

Supplementary materials

Supplementary data including six tables can be found with this article online at

Supplementary Table 1

Signature details based on Escherichia coli 53638 reference

Supplementary Table 2

Signature details based on Escherichia coli IHE3034 reference

Supplementary Table 3

Signature details based on Escherichia coli RN587/1 reference

Supplementary Table 4

Signature details based on Escherichia coli STEC_B2F1 reference

Supplementary Table 5

Signature details based on Escherichia coli STEC_C165_02 reference

Supplementary Table 6

Aalphabetic abbreviation and description of putative conserved domains


1. Lara-Tejero M, Galan JE. Cytolethal distending toxin: limited damage as a strategy to modulate cellular functions. Trends Microbiol 2002;10:147–152. PMID: 11864825.
crossref pmid
2. Tóth I, Nougayrède JP, Dobrindt U, Ledger TN, Boury M, Morabito S, et al. Cytolethal distending toxin type I and type IV genes are framed with lambdoid prophage genes in extra-intestinal pathogenic Escherichia coli. Infect Immun 2009;77:492–500. PMID: 18981247.
crossref pmid pmc
3. Lara-Tejero M, Galán JE. A bacterial toxin that controls cell cycle progression as a deoxyribonuclease I-like protein. Science 2000;290:354–357. PMID: 11030657.
crossref pmid
4. Agren J, Sundström A, Håfström T, Segerman B. Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS One 2012;7:e39107. PMID: 22723939.
crossref pmid pmc
5. Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 2003;425:798–804. PMID: 14574403.
crossref pmid
6. Dubchak I, Poliakov A, Kislyuk A, Brudno M. Multiple whole-genome alignments without a reference organism. Genome Res 2009;19:682–689. PMID: 19176791.
crossref pmid pmc
7. Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: algorithms for genome multiple sequence alignment. Genome Res 2011;21:1512–1528. PMID: 21665927.
crossref pmid pmc
8. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 2004;14:708–715. PMID: 15060014.
crossref pmid pmc
9. Rausch T, Emde AK, Weese D, Döring A, Notredame C, Reinert K. Segment-based multiple sequence alignment. Bioinformatics 2008;24:i187–i192. PMID: 18689823.
crossref pmid
10. Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD, Perna NT. Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 2009;25:2071–2073. PMID: 19515959.
crossref pmid pmc
11. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010;5:e11147. PMID: 20593022.
crossref pmid pmc
12. Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 2004;14:1394–1403. PMID: 15231754.
crossref pmid pmc
13. Kloepper TH, Huson DH. Drawing explicit phylogenetic networks and their integration into SplitsTree. BMC Evol Biol 2008;8:22. PMID: 18218099.
crossref pmid pmc
14. Lukjancenko O, Wassenaar TM, Ussery DW. Comparison of 61 sequenced Escherichia coli genomes. Microb Ecol 2010;60:708–720. PMID: 20623278.
crossref pmid pmc
15. Gardner SN, Hall BG. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS One 2013;8:e81760. PMID: 24349125.
crossref pmid pmc
16. Asakura M, Hinenoya A, Alam MS, Shima K, Zahid SH, Shi L, et al. An inducible lambdoid prophage encoding cytolethal distending toxin (Cdt-I) and a type III effector protein in enteropathogenic Escherichia coli. Proc Natl Acad Sci U S A 2007;104:14483–14488. PMID: 17726095.
crossref pmid pmc
17. Johnson WM, Lior H. A new heat-labile cytolethal distending toxin (CLDT) produced by Escherichia coli isolates from clinical material. Microb Pathog 1988;4:103–113. PMID: 2849027.
crossref pmid
18. Kim JH, Kim JC, Choo YA, Jang HC, Choi YH, Chung JK, et al. Detection of cytolethal distending toxin and other virulence characteristics of enteropathogenic Escherichia coli isolates from diarrheal patients in Republic of Korea. J Microbiol Biotechnol 2009;19:525–529. PMID: 19494702.
crossref pmid
19. Oloomi M, Bouzari S. Molecular profile and genetic diversity of cytolethal distending toxin (CDT)-producing Escherichia coli isolates from diarrheal patients. APMIS 2008;116:125–132. PMID: 18321363.
crossref pmid
20. Edwards DJ, Holt KE. Beginner's guide to comparative bacterial genome analysis using next-generation sequence data. Microb Inform Exp 2013;3:2. PMID: 23575213.
crossref pmid pmc
Fig. 1

Phylogenetic heat-plot overview of multiple-genome alignments. A heat plot based on a 200/100 BLASTN fragmented alignment was performed with Gegenees software. Six distinct genomic groups (T1–T6) recognized in cytolethal distending toxin (CDT)+ strains were observed sporadically among the strains that were studied, revealing the heterogeneous genomic nature of CDT-producing Escherichia coli.

Fig. 2

Phylogram overview. A phylogram was generated using SplitsTree 4 software, using the neighbor-joining method and a distance-matrix Nexus file exported from Gegenees software. The Escherichia albertii TW07627 and Escherichia fergusonii ATCC 35469 strains were set as out-groups. In addition, six unique groups (T1–T6) were analyzed. In the phylogenetic overview, a sporadic pattern of cytolethal distending toxin (CDT)–producing strains was observed, as were specific clades. These strains were related and their similarities were shown. CDT+ strains are shown in boxes. The Escherichia coli strains that were set as reference strains for biomarker-detection studies are indicated with red arrows.

Fig. 3

Biomarker regions. Biomarker regions were illustrated in the whole-genome sequences of five different reference strains including Escherichia coli 53638, E. coli IHE3034, E. coli RN587/1, E. coli STEC B2F1, and E. coli STEC C165-02. The biomarker score (max/average) setting was used. A score of 1.0 is the maximum biomarker score, which was considered to represent a signature sequence, as indicated in green. STEC, Shiga toxin-producing E. coli.

Table 1.

Strains characteristics

Strain DNA length (Mb) cdt gene GC% Protein count Gene count Genome type, No. of subsequences/contigs Pathotype, serotype, other characteristic Accession No.
Escherichia coli 96.0497 5.01426 50.80 4,862 5,026 Draft, 13 Host: homo sapiens, O91:H21 NZ_AEZQ00000000.2
Escherichia coli 3003 4.91733 50.7 4,825 4,982 Draft, 8 I.S: water, O157:H45 NZ_AFAF00000000.2
Escherichia coli 5412 5.38651 50.20 5,670 5,761 Draft, 373 Host: homo sapiens, SFO157 NZ_AMUJ00000000.1
Escherichia coli 53638 5.37179 50.99 4,803 5,218 Draft, 2 EIEC, O144 NZ_AAKB00000000.2
Escherichia coli ARS4.2123 4.98276 50.50 5,105 5,194 Draft, 209 I.S: water, O157:H16 NZ_AMUL00000000.1
Escherichia coli DEC3F 5.4079 50.30 5,541 5,692 Draft, 93 Host: homo sapiens, SF EHEC O157:H NZ_AIFJ00000000.1
Escherichia coli KTE11 4.52715 50.50 4,109 4,214 Draft, 7 No published information NZ_ANSR00000000.1
Escherichia coli KTE28 5.0544 50.40 4,673 4,760 Draft, 12 No published information NZ_ANSY00000000.1
Escherichia coli KTE47 4.98747 50.60 4,694 4,798 Draft, 11 No published information NZ_ANUB00000000.1
Escherichia coli KTE60 5.07079 50.50 4,664 4,756 Draft, 20 No published information NZ_ANUJ00000000.1
Escherichia coli KTE137 5.00154 50.50 4,702 4,789 Draft, 99 No published information NZ_ANYA00000000.1
Escherichia coli KTE178 5.30789 50.60 4,973 5,050 Draft, 11 No published information NZ_ANTB00000000.1
Escherichia coli KTE180 5.12548 50.60 4,883 4,966 Draft, 112 No published information NZ_ANYR00000000.1
Escherichia coli KTE209 5.11008 50.50 4,702 4,791 Draft, 3 No published information NZ_ANXD00000000.1
Escherichia coli MS 21-1 5.30899 50.40 5,744 5,860 Draft, 206 No published information NZ_ADTR00000000.1
Escherichia coli O157:H- 493-89 5.05482 50.50 4,838 4,946 Draft, 204 Host: homo sapiens, O157:H- NZ_AETY00000000.1
Escherichia coli O157:H43 T22 4.95898 50.80 4,859 4,935 Draft, 64 I.S: milk from healthy cattle, O157:H43 NZ_AHZD00000000.2
Escherichia coli RN587/1 5.06158 50.60 4,999 5,108 Draft, 73 EPEC, O157:H8 NZ_ADUS00000000.1
Escherichia coli STEC B2F1 4.98941 50.90 4,875 5,006 Draft, 37 STEC, O91:H21 NZ_AFDQ00000000.1
Escherichia coli STEC C165-02 5.00927 50.60 4,891 5,019 Draft, 30 STEC, O73:H16 NZ_AFDR00000000.1
Escherichia coli TA271 5.07582 50.70 5,081 5,197 Draft, 83 Host: some mammal NZ_ADAZ00000000.1
Escherichia coli TW06591 5.47546 50.30 5,521 5,650 Draft, 45 Host: homo sapiens, O157:H- NZ_AKLT00000000.1
Escherichia coli W26 5.11853 50.60 4,852 4,920 Draft, 165 Host: cow, I.S: feces NZ_AGIA00000000.1
Escherichia albertii TW07627 4.74659 49.90 4,386 4,889 Draft, 43 Diarrhea genic NZ_ABKX00000000.1
Escherichia coli APEC O1 5.49765 50.29 4,853 4,968 Complete, 3 ExPEC, O1:K1:H7, avian pathogenic NC_008563.1
Escherichia coli IHE3034 5.10838 50.70 4,966 4,753 Complete, 1 ExPEC, O18:K1:H7, meningitis NC_017628.1
Escherichia coli 042 5.35532 - 50.58 4,920 5,036 Complete, 2 EAEC, O44:H18 NC_017626.1
Escherichia coli 536 4.93892 - 50.50 4,619 4,779 Complete, 1 UPEC, O6:K15:H31 NC_008253.1
Escherichia coli 55989 5.15486 - 50.70 4,755 5,136 Complete, 1 EAEC NC_011748.1
Escherichia coli ABU 83972 5.13296 - 50.60 4,795 4,905 Complete, 2 ExPEC UTI, OR:K5:H- NC_017631.1
Escherichia coli APEC O78 4.79843 - 50.70 4,588 4,695 Complete, 1 ExPEC NC_020163.1
Escherichia coli ATCC 8739 4.74622 - 50.90 4,199 4,408 Complete, 1 K12 derivative NC_010468.1
Escherichia coli B REL606 4.62981 - 50.80 4,200 4,361 Complete, 1 Commensal, strain B NC_012967.1
Escherichia coli BL21 DE3 4.55895 - 50.80 4,153 4,330 Complete, 1 Commensal, strain B NC_012971.2
Escherichia coli BW2952 4.57816 - 50.80 4,079 4,262 Complete, 1 K12 derivative NC_012759.1
Escherichia coli CFT073 5.23143 - 50.50 5,364 5,574 Complete, 1 ExPEC, UPEC, O6:K2:H1 NC_004431.1
Escherichia coli DH1 4.63071 - 50.80 4,160 4,375 Complete, 1 K12 derivative NC_017625.1
Escherichia coli E24377A 5.24929 - 50.54 4,991 5,258 Complete, 7 ETEC, O139:H28 NC_009801.1
Escherichia coli ED1a 5.20955 - 50.70 4,911 5,321 Complete, 1 Commensal, O81 NC_011745.1
Escherichia coli ETEC H10407 5.32589 - 50.73 4,872 5,084 Complete, 5 ETEC, O78:H11 NC_017633.1
Escherichia coli HS 4.64354 - 50.80 4,374 4,626 Complete, 1 Commensal, O9 NC_009800.1
Escherichia coli IAI1 4.70056 - 50.80 4,345 4,629 Complete, 1 Commensal NC_011741.1
Escherichia coli IAI39 5.13207 - 50.60 4,725 5,092 Complete, 1 ExPEC, UPEC, O7:K1 NC_011750.1
Escherichia coli JJ1886 5.30828 - 50.77 5,049 5,213 Complete, 6 ExPEC, UPEC NC_022648.1
Escherichia coli K-12 DH10B 4.68614 - 50.80 4,124 4,352 Complete, 1 K12 derivative NC_010473.1
Escherichia coli K-12 MG1655 4.64165 - 50.80 4,140 4,497 Complete, 1 Commensal, K12 NC_000913.3
Escherichia coli K-12 W3110 4.64633 - 50.80 4,213 4,436 Complete, 1 Commensal, K12 NC_007779.1
Escherichia coli KO11FL 5.02717 - 50.79 4,705 4,821 Complete, 2 Commensal NC_017660.1
Escherichia coli LF82 4.77311 - 50.70 4,376 4,545 Complete, 1 AIEC NC_011993.1
Escherichia coli LY180 4.8356 - 50.90 4,463 4,624 Complete, 1 Ethanologenic E. coli NC_022364.1
Escherichia coli NA114 4.97146 - 51.20 4,873 4,975 Complete, 1 ExPEC, UPEC NC_017644.1
Escherichia coli O7:K1 CE10 5.37873 - 50.58 5,080 5,269 Complete, 5 ExPec, Neonatal meningitis, O7:K1 NC_017646.1
Escherichia coli O26:H11 11368 5.85553 - 50.66 5,515 5,985 Complete, 5 EHEC, O26:H11 NC_013361.1
Escherichia coli O55:H7 CB9615 5.45235 - 50.48 5,117 5,367 Complete, 2 EPEC, O55:H7 NC_013941.1
Escherichia coli O83:H1 NRG 857C 4.89488 - 50.71 4,582 4,690 Complete, 2 AIEC, O83:H1 NC_017634.1
Escherichia coli O103:H2 12009 5.52486 - 50.68 5,117 5,541 Complete, 2 EHEC, O103:H2 NC_013353.1
Escherichia coli O104:H4 2011C-3493 5.43741 - 50.63 5,149 5,269 Complete, 4 EAEC/STEC, O104:H4 NC_018658.1
Escherichia coli O111:H- 11128 5.76608 - 50.42 5,403 5,931 Complete, 6 EHEC, O111:H NC_013364.1
Escherichia coli O127:H6 E2348 69 5.06968 - 50.55 4,647 5,011 Complete, 3 EPEC, O127:H6 NC_011601.1
Escherichia coli O157:H7 EC4115 5.70417 - 50.39 5,477 6,066 Complete, 3 EHEC, O157:H7 NC_011353.1
Escherichia coli O157:H7 EDL933 5.6394 - 50.45 5,772 5,920 Complete, 2 EHEC, O157:H7 NC_002655.2
Escherichia coli O157:H7 Sakai 5.59448 - 50.45 5,292 5,448 Complete, 3 EHEC, O157:H7 NC_002695.1
Escherichia coli O157:H7 TW14359 5.62274 - 50.46 5,363 5,586 Complete, 2 EHEC, O157:H7 NC_013008.1
Escherichia coli P12b 4.93529 - 50.90 4,379 4,567 Complete, 1 O15:H17 NC_017663.1
Escherichia coli PMV 1 5.21093 - 50.67 4,979 5,257 Complete, 2 ExPEC, O18:K1 NC_022370.1
Escherichia coli S88 5.16612 - 50.66 4,823 5,187 Complete, 2 ExPEC, Neonatal Meningitis, O45:K1:H7 NC_011742.1
Escherichia coli SE11 5.15563 - 50.75 4,996 5,103 Complete, 7 Commensal, O152:H28 NC_011415.1
Escherichia coli SE15 4.83968 - 50.71 4,486 4,592 Complete, 2 Commensal, O150:H5 NC_013654.1
Escherichia coli SMS-3-5 5.21538 - 50.50 4,912 5,127 Complete, 5 Environmental isolate NC_010498.1
Escherichia coli UM146 5.10756 - 50.61 4,783 4,891 Complete, 2 AIEC (adherent invasive) NC_017632.1
Escherichia coli UMN026 5.3582 - 50.64 5,010 5,294 Complete, 3 ExPEC, UPEC, O7:K1 NC_011751.1
Escherichia coli UMNK88 5.66676 - 50.74 5,607 5,754 Complete, 6 Porcine ETEC, O149 NC_017641.1
Escherichia coli UTI89 5.17997 - 50.61 5,162 5,272 Complete, 2 ExPEC, UPEC, O18:K1:H7 NC_007946.1
Escherichia coli W 5.00886 - 50.78 4,602 4,876 Complete, 3 Commensal, ATCC 9637 NC_017635.1
Escherichia coli Xuzhou21 5.51674 - 50.38 5,179 5,294 Complete, 3 EHEC, O157:H7 NC_017906.1
Escherichia fergusonii ATCC 35469 4.64386 - 49.88 4,314 4,543 Complete, 2 I.S: Feces, human NC_011740.1

I.S, isolation source; EIEC, enteroinvasive E. coli; EHEC, enterohemorrhagic E. coli; EPEC, enteropathogenic E. coli; STEC, Shiga toxin-producing E. coli; ExPEC, extraintestinal pathogenic E. coli; EAEC, enteroaggregative E. coli; UPEC, uropathogenic E. coli; ETEC, enterotoxigenic E. coli; AIEC, adherent invasive E. coli.

Table 2.

Significant signature proteins in five reference Escherichia coli strains

Signature protein Reference strain
Escherichia coli 53638 Escherichia coli IHE3034 Escherichia coli RN587/1 Escherichia coli STEC_B2F1 Escherichia coli STEC_C165_02
Cytolethal distending toxin Cytolethal distending toxin A Cytolethal distending toxin, subunit C Cytolethal distending toxin A/C family protein Cytolethal distending toxin C Cytolethal distending toxin A/C family protein
Cytolethal distending toxin B Cytolethal distending toxin, subunit B Cytolethal distending toxin A/C family protein
Cytolethal distending toxin subunit C Cytolethal distending toxin, subunit A
Holin Phage holin, lambda family Holin, lambda family Holing -a Phage holin, lambda family
Nuclease Exodeoxyribonuclease 8 Exonuclease family protein Exonuclease family protein Endonuclease/Exonucl ease/phosphatase family protein Restriction endonuclease family protein
Hypothetical protein ECRN5871_4153, [HNH endonuclease family protein] Type I site-specific deoxyribonuclease, HsdR family Type I site-specific deoxyribonuclease, HsdR family protein
Hypothetical protein ECSTECC16502_028 0, [HNH endonuclease]
Endonuclease/Exonucl ease/phosphatase family protein
Phage integrase Phage integrase Integrase/recombinase, phage integrase family Integrase Phage integrase family protein Integrase
Prophage integrase Site-specific recombinase, phage integrase family Phage integrase family protein Prophage lambda integrase Prophage CP4-57 integrase
Integrase for prophage CP-933T Prophage lambda integrase
Integrase domain protein
Putative membrane protein Putative membrane protein Hypothetical protein ECOK1_2122, [membrane protein] Outer membrane autotransporter barrel domain protein Putative membrane protein Putative membrane protein
Hypothetical protein Ec53638_1156, [membrane protein] Hypothetical protein ECOK1_2557, Hypothetical protein ECSTECB2F1_3192, [membrane protein]
OmpA-like transmembrane domain protein
Outer membrane porin protein LC
Outer membrane protein lom
Regulatory proteins Phage regulatory protein Cro Putative transcriptional regulator DicA157 Regulatory protein CII Transcriptional regulator, AraC family 4-Hydroxyphenylaceta te catabolism regulatory protein HpaA
Transcriptional regulator, AlpA family Putative regulatory protein Cox Prophage CP4-57 regulatory protein family protein Prophage CP4-57 regulatory protein family protein
Putative phage regulatory protein, Rha family Transcriptional regulator, LacI family
Restriction-modification system Putative type I restriction-modification system, S subunit -a Type II restriction enzyme EcoRII Type I restriction modification DNA specificity domain protein Type I restriction-modification system specificity determinant
Type I restriction-modification system specificity subunit Modification methylase EcoRII Type III restriction enzyme, res subunit
Type I restriction-modification enzyme, R subunit Type I restriction enzyme specificity protein
Type I restriction-modification system, M subunit Type I restriction-modification system, M subunit
Tail fiber assembly family, baseplate assembly proteins, Tail fiber protein and Tail tape measure protein Tail fiber assembly protein Tail fiber protein Caudovirales tail fiber Tail fiber assembly Caudovirales tail fiber assembly family protein
Phage P2 baseplate assembly protein gpV Phage tail tape measure protein Assembly family protein Hypothetical protein ECSTECB2F1_0901, [tail fiber assembly protein, caudovirales tail fiber assembly protein] Tail fiber
Hypothetical protein ECRN5871_3504,[tail fiber assembly protein] Tail fiber domain protein
Putative tail fiber protein Baseplate assembly protein V, W Caudovirales tail fiber assembly family protein Phage tail fiber repeat family protein
Tail fiber Long tail fiber protein p37 domain protein Prophage tail fiber family protein
Phage tail tape measure protein family Tail fiber domain protein Phage tail fiber repeat family protein
Phage tail tape measure protein, TP901 family, core region Phage tail tape measure protein, lambda family
Terminase Phage terminase large subunit -a Phage small terminase subunit Phage terminase large subunit family protein Terminase small subunit
Terminase Terminase, ATPase subunit Terminase B protein domain protein
Terminase, endonuclease subunit Terminase B protein
Terminase large subunit
Terminase small subunit
Transferase Pyruvyl transferase -a Hypothetical protein Putative teichuronic acid biosynthesis glycosyltransferase tuaG Acetyltransferase family protein
Glycosyl transferase domain protein, group 2 family ECRN5871_3051, [nucleotidyl transferase, PF08843 family] Glucose-1-phosphate thymidylyltransferase Hypothetical protein ECSTECC16502_1 295, [acetyltransferase]
Glycosyltransferase, sugar-binding region containing D12 class N6 adeninespecific DNA methyltransferase family protein RTX toxin acyltransferase family protein
DXD motif Hypothetical protein ECRN5871_0025, [N-acetyltransferase CN5] Acetyl-CoA acetyltransferase

a There are lots of hypothetical proteins with unknown function in desired genome which they have mentioned but their roles have not been defined yet.


Browse all articles >

Editorial Office
Rm.1011, The Korea Science & Technology Center, 22, Teheran-ro, 7-gil, Gangnam-gu, Seoul, 06130, Korea
Tel: +82-2-558-9394    Fax: +82-2-558-9434    E-mail:                

Copyright © 2019 by Korea Genome Organization. All rights reserved.

Developed in M2community

Close layer
prev next