README
CoreBoost is a program for predicting the location of
transcription start sites for human polymerase II promoters. It is based on two
classifiers, one for CpG related promoters and the
other for non-CpG related promoters. Both classifiers
apply boosting techniques with stumps and select important small scale as well
as large scale sequence features such as position specific core promoter
elements and the flexibility of promoter sequences. The current
version has more than 30% sensitivity and positive predictive value at 50 bp resolution.
The program works by sliding a window of 300bp along the
input sequences. Due to the use of large scale features, the current version
requires users to input 1.3 kb flanking sequences together with their
interested searching segment. Only the positions 1.3 kb from the start and
before the end are searched for putative TSSs. Positions with probability exceeding the
pre-specified threshold are considered as possible candidates. The candidates within
500bp are clustered and the one with the best score is output as the putative
TSS.
CoreBoost is not intended for genome wide searching. We
recommend using some prior information to first localize the search
region to about 2.5 kb and then applying CoreBoost.
Much prior information is available for localizing the search, including
the Pol-II ChIP-chip data,
EST or mRNA alignment, and also predicted regions from
gene-finding programs, such as Genscan.
The input sequence file must be in FASTA format. No space
is allowed in the name of each sequence. The results will be displayed on the
screen and will also be sent to the input email address as requested. The first
number in the result section is the position of the predicted TSS (index from
0) in the positive strand and the second number is the probability score. If
you don’t get any hit, you can try a lower threshold. The positive strand
is the default to search. Users can also specify to search the negative strand.
CoreBoost is developed by Xiaoyue Zhao, Zhenyu Xuan & Michael Zhang
at Cold Spring Harbor Laboratory. Please send inquiry emails and report bugs to
zhaox AT cshl DOT org.
CoreBoost is provided without any warranty. The authors assume no
legal liability or responsibility for the results it produces or conclusions
based thereupon. It is distributed free of charge to academic uses only. Please
do not distribute this software to others without permission of the developers.
ALL RIGHTS RESERVED.