TY - JOUR
T1 - NPEST
T2 - a nonparametric method and a database for transcription start site prediction
AU - Tatarinova, Tatiana
AU - Kryshchenko, Alona
AU - Triska, Martin
AU - Hassan, Mehedi
AU - Murphy, Denis
AU - Neely, Michael
AU - Schumitzky, Alan
PY - 2013/12
Y1 - 2013/12
N2 - In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.
AB - In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.
KW - transcription start site
KW - TSS
KW - nonparametric maximumlikelihood
U2 - 10.1007/s40484-013-0022-2
DO - 10.1007/s40484-013-0022-2
M3 - Article
C2 - 25197613
SN - 2095-4689
VL - 1
SP - 261
EP - 271
JO - Quantitative Biology
JF - Quantitative Biology
IS - 4
ER -