Yair Benita

My personal page | My PostDoc page
General interests | PhD Thesis | Department of Psychopharmacology
Small Genes | Local DB (restricted)
PCR | DNA & Protein Analysis
PCR Success Probability (NAR 2003) | Protein Expression (Submitted)
subglobal6 link | subglobal6 link
subglobal7 link | subglobal7 link

Analysis of Protein Expression

small logo

Analysis of High-Throughput Protein Expression in Escherichia Coli

Submitted for publication.

Analyzing your set of proteins

To create Table 2 with your own sequences you will need Python, Biopython and ZODB. The sequence analysis modules I wrote are available through Biopython but require programming skills. I will post my own custom scripts and detailed instructions soon but it will still be a bit complicated for those who don't have any programming knownledge. In parallel I am working on making an easy-to-use web interface. In the meantime, please feel free to send me your sequences in fasta format and I will create Table 2 with your own data. Make sure you include the group number in the title of each sequence.

Highly expressed genes in E.coli

This is a set of 121 proteins that are highly expressed in E.coli. These proteins were derived from 2D gel analysis, using the SWISS-2DPAGE database. I downloaded all the E.coli 2D gels in melanie format (melanie is a software for 2D gel analysis) from the ftp web server and therefore was able to make my own selection of spots based on the gel analysis.
This excel file contains the data I gathered from all the melanie analyzed gels of E.coli. The file is sorted based on %Vol and there are 121 spots with a %Vol above 0.2 (marked in light orange background). The DNA and protein sequences of these genes were fetched and are available in Fasta format.

Codon adaptation index was calculated based on these 121 highly expressed genes. The calculation was described by Sharp and Li and is performed using the codon usage module I submitted to biopython. Here is a chart showing the difference between these CAI values and the original CAI Values.

These are the values per codon calculated from the 121 highly expressed E.coli genes:

Amino Acid
Codon
CAI
Amino Acid
Codon
CAI
Amino Acid
Codon
CAI
Ala
GCG
1.000
Gly
GGC
1.000
Pro
CCG
1.000
Ala
GCA
0.690
Gly
GGT
0.994
Pro
CCA
0.277
Ala
GCT
0.677
Gly
GGG
0.186
Pro
CCT
0.189
Ala
GCC
0.632
Gly
GGA
0.113
Pro
CCC
0.090
Arg
CGT
1.000
His
CAC
1.000
Ser
AGC
1.000
Arg
CGC
0.692
His
CAT
0.735
Ser
TCT
0.910
Arg
CGG
0.053
Ile
ATC
1.000
Ser
TCC
0.784
Arg
CGA
0.039
Ile
ATT
0.714
Ser
TCG
0.404
Arg
AGA
0.023
Ile
ATA
0.033
Ser
AGT
0.302
Arg
AGG
0.011
Leu
CTG
1.000
Ser
TCA
0.301
Asn
AAC
1.000
Leu
CTC
0.131
Thr
ACC
1.000
Asn
AAT
0.396
Leu
TTG
0.120
Thr
ACT
0.437
Asp
GAT
1.000
Leu
CTT
0.114
Thr
ACG
0.347
Asp
GAC
0.856
Leu
TTA
0.107
Thr
ACA
0.159
Cys
TGC
1.000
Leu
CTA
0.023
Trp
TGG
1.000
Cys
TGT
0.676
Lys
AAA
1.000
Tyr
TAC
1.000
Gln
CAG
1.000
Lys
AAG
0.243
Tyr
TAT
0.822
Gln
CAA
0.345
Met
ATG
1.000
Val
GTT
1.000
Glu
GAA
1.000
Phe
TTC
1.000
Val
GTG
0.967
Glu
GAG
0.347
Phe
TTT
0.691
Val
GTA
0.497
Val
GTC
0.466

 

Computing protein attributes

The DNA and protein sequence analysis modules were all written in Python. The modules I used for the article are more advanced than the modules I submitted to biopython, mostly because they are imbeded in ZODB infrastructure. However, I am currently working on upgrading the biopython module. If you need to see it sooner rather than later, please contact me.

Contact Me | ©2006 Utrecht University