README
CoreBoost_HM is a program for predicting the location of human polymerase II (Pol II) core-promoters. It is developed based on human Pol II core-promoter predictor CoreBoost (http://rulai.cshl.edu/tools/CoreBoost) by integrating specific histone modification profiles and the DNA sequence features together. Same as Coreboost, CoreBoost_HM is based on two classifiers, one for CpG related promoters and the other for non-CpG related promoters. Both classifiers apply boosting techniques with stumps and select important sequence features and histone modification features. Our analysis suggested that integrating histone modification profiles can provide much higher sensitivity, specificity and high resolution for human core-promoter prediction.
Input and Output:
User should provide the chromosome id, start position, end position and strand information of the genomic region they want to search to the program. Provide the correct DNA strand information is essential for CoreBoost_HM search. Positions with prediction score exceeding the pre-specified threshold are considered as possible candidates. The candidates within the clustering distance (e.g. 500 bp) are clustered and the one with the best score is output as the putative TSS. The results will be displayed on the screen and will also be sent to the input email address as requested. The first number in the result section is the position of the predicted TSS and the second number is the prediction score. Users can also browse the prediction score profile in the UCSC Genome Browser by clicking the link on the output page.
Limitations:
CoreBoost_HM is not intended for genome wide searching. The maximum acceptable size of the genomic region is 100 KB. We recommend using some prior information to first localize the search region to about 2.5 kb and then applying CoreBoost_HM. Much prior information is available for localizing the search, including the Pol-II ChIP-chip data, EST or mRNA alignment, and also predicted regions from gene-finding programs, such as Genescan.
How to choose the predictors?
CoreBoost_HM is composed by two predictors trained by CpG related and non-CpG related promoters separately. We suggest users try the CpG promoter predictor first because about 3/4 of the human gene promoters are CpG related, and the prediction of CpG related promoter is more accurate than non-CpG related promoters. If there is no satisfactory prediction, use the non-CpG predictor to have a try. Besides, users should aware that the same prediction scores provided by CpG and non-CpG related promoter predictor has different meanings. Typically, the one provided by CpG related promoter predictor is more convinced than that provided by non-CpG related promoter predictors.
How to choose the threshold?
We provide empirical sensitivity and PPV values according to different threshold for users' reference (http://rulai.cshl.edu/tools/CoreBoost_HM/Threshold.htm). But users should in mind that these values were estimated based on all of the promoters in the training set of all expression levels. Typically, inactive promoters my have weaker histone modification signals and may got relatively lower CoreBoost_HM prediction score, but the core-promoter regions of these genes can still get a relatively higher prediction score than its flanking background sequence. So we suggest the users have a brief look of the prediction score profile in the Genome browser an chose the local maximum predictions.
Availability:
The easiest way to use CoreBoost_HM is through our website (http://rulai.cshl.edu/tools/CoreBoost_HM). Because CoreBoost_HM has to access both the sequence and histone modification profiles of the human genome, no easy-to-use stand-alone version has been established yet for download and quick install. Please contact us if you want to set up your local CoreBoost_HM.
Please send inquiry emails and report bugs to xwwang@tsinghua.edu.cn
CoreBoost_HM is provided without any warranty. The authors assume no legal liability or responsibility for the results it produces or conclusions based thereupon. It is distributed free of charge to academic uses only. Please do not distribute this software to others without permission of the developers. ALL RIGHTS RESERVED.
Reference:
1. Zhao X, Xuan Z, Zhang MQ: Boosting with stumps for predicting transcription start sites. Genome Biol 2007, 8(2):R17.
2. Wang X, Xuan Z, Zhao X, Li Y, Zhang MQ: High-resolution Human Core-promoter prediction with CoreBoost_HM, Genome Research 2009,19(2):266-75