README

 

CoreBoost is a program for predicting the location of transcription start sites for human polymerase II promoters. It is based on two classifiers, one for CpG related promoters and the other for non-CpG related promoters. Both classifiers apply boosting techniques with stumps and select important small scale as well as large scale sequence features such as position specific core promoter elements and the flexibility of promoter sequences. The current version has more than 30% sensitivity and positive predictive value at 50 bp resolution. 

 

The program works by sliding a window of 300bp along the input sequences. Due to the use of large scale features, the current version requires users to input 1.3 kb flanking sequences together with their interested searching segment. Only the positions 1.3 kb from the start and before the end are searched for putative TSSs. Positions with probability exceeding the pre-specified threshold are considered as possible candidates. The candidates within 500bp are clustered and the one with the best score is output as the putative TSS.

 

CoreBoost is not intended for genome wide searching. We recommend using some prior information to first localize the search region to about 2.5 kb and then applying CoreBoost. Much prior information is available for localizing the search, including the Pol-II ChIP-chip data, EST or mRNA alignment, and also predicted regions from gene-finding programs, such as Genscan.

 

The input sequence file must be in FASTA format. No space is allowed in the name of each sequence. The results will be displayed on the screen and will also be sent to the input email address as requested. The first number in the result section is the position of the predicted TSS (index from 0) in the positive strand and the second number is the probability score. If you don’t get any hit, you can try a lower threshold. The positive strand is the default to search. Users can also specify to search the negative strand.

 

CoreBoost is developed by Xiaoyue Zhao, Zhenyu Xuan & Michael Zhang at Cold Spring Harbor Laboratory. Please send inquiry emails and report bugs to zhaox AT cshl DOT org.

 

CoreBoost is provided without any warranty. The authors assume no legal liability or responsibility for the results it produces or conclusions based thereupon. It is distributed free of charge to academic uses only. Please do not distribute this software to others without permission of the developers. ALL RIGHTS RESERVED.