TY - JOUR
T1 - Genome-wide discovery of cis-elements in promoter sequences using gene expression data
AU - Tatarinova, Tatiana
AU - Bouck, John
AU - Flavell, Richard
AU - Alexandrov, Nickolai
AU - Troukhan, Maxim
PY - 2009/4/30
Y1 - 2009/4/30
N2 - The availability of complete or nearly complete genome sequences, a large number of 5' expressed sequence tags, and significant public expression data allow for a more accurate identification of cis-elements regulating gene expression. We have implemented a global approach that takes advantage of available expression data, genomic sequences, and transcript information to predict cis-elements associated with specific expression patterns. The key components of our approach are: (1) precise identification of transcription start sites, (2) specific locations of cis-elements relative to the transcription start site, and (3) assessment of statistical significance for all sequence motifs. By applying our method to promoters of Arabidopsis thaliana and Mus musculus, we have identified motifs that affect gene expression under specific environmental conditions or in certain tissues. We also found that the presence of the TATA box is associated with increased variability of gene expression. Strong correlation between our results and experimentally determined motifs shows that the method is capable of predicting new functionally important cis-elements in promoter sequences.
AB - The availability of complete or nearly complete genome sequences, a large number of 5' expressed sequence tags, and significant public expression data allow for a more accurate identification of cis-elements regulating gene expression. We have implemented a global approach that takes advantage of available expression data, genomic sequences, and transcript information to predict cis-elements associated with specific expression patterns. The key components of our approach are: (1) precise identification of transcription start sites, (2) specific locations of cis-elements relative to the transcription start site, and (3) assessment of statistical significance for all sequence motifs. By applying our method to promoters of Arabidopsis thaliana and Mus musculus, we have identified motifs that affect gene expression under specific environmental conditions or in certain tissues. We also found that the presence of the TATA box is associated with increased variability of gene expression. Strong correlation between our results and experimentally determined motifs shows that the method is capable of predicting new functionally important cis-elements in promoter sequences.
KW - gene expression
KW - bioinformatics
KW - transcription factor binding sites
U2 - 10.1089/omi.2008.0034
DO - 10.1089/omi.2008.0034
M3 - Article
C2 - 19231992
VL - 13
SP - 139
EP - 151
JO - OMICS: A Journal of Integrative Biology
JF - OMICS: A Journal of Integrative Biology
SN - 1536-2310
IS - 2
ER -