The RMAP software for short-read mapping

2009-08-18

What's New

RMAP is aimed to map accurately reads from the next-generation sequencing technology. RMAP can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated reads mapping. There is no limitaions on read widths or number of mismatches. RMAP can now map more than 8 million reads in an hour at full sensitivity to 2 mismatches.

Download and System Requirements

Download

RMAP source code can be downloaded here: rmap_v2.05.tbz2

System Requirements

64-bit machine and GCC >= 4.1 (to support TR1)

Install

To install rmap, download the compressed archive, unpack it similar to:

$ tar -jxvf rmap_vX.X.tbz2


change directories into the unpacked source directory, and type

$ make install

Quick Usage Guide

Here are some examples showing how to run RMAP. Complete parameter lists can be found by typing the program name with -help in the shell. More details are described in the RMAP manual.


1. To map next-generation sequencing reads to a reference genome. Use -o to specify the output filename (BED format); use -c to specify the target file or the file directory that contains chromosome sequence files (FASTA format). The last parameter is a FASTA/FASTQ file that contains read sequences. Additionally, you can add -v to show the mapping progresses. Please note, each read can only occupy one line; meaning RMAP will stop and show an error message if read sequences are span across several lines.

$ rmap -o mapped_locations.bed -c chromosomes_dir reads.fa

or

$ rmap -o mapped_locations.bed -c chromosomes_dir reads.fq


2. To indicate the number of allowed mismatches (-m, default: 10) in the mapping and to specify seed structures, seed number (-S, default: 3) and seed weight (-h, default: 11).

$ rmap -S 4 -h 8 -m 20 -o mapped_locations.bed -c chromosomes_dir reads.fa


3. To output ambiguously mapped reads. The amb_mapped.txt file only contains read names. By default, reads that are mapped at two or more locations will be considered as ambiguously mapped reads. One can add -M x in the command, and reads that are mapped more than x times will be reported in the amb_mapped.txt file; reads that are mapped less than x times will be reported in the file with every mapped locations.

$ rmap -a amb_mapped.txt -M 10 -o mapped_locations.bed -c chromosomes_dir reads.fa


4. To utilize full quality score information (PRB file from Illumina/Solexa pipeline: four quality scores for one nucleotide), use -p prb_filename.

$ rmap -p reads.prb -o mapped_locations.bed -c chromosomes_dir reads.fa


5. With quality score information (input file must be FASTQ or PRB file), one can also use wildcard matching method (-W) with or without a user-defined cutoff or weight-matrix matching method (-Q).

$ rmap -W -o mapped_locations.bed -c chromosomes_dir reads.fq

$ rmap -P 10 -o mapped_locations.bed -c chromosomes_dir reads.fq

$ rmap -Q -p reads.prb -o mapped_locations.bed -c chromosomes_dir reads.fa


6. To map paired-end reads and to specify the minimal and maximal separation between ends. The default values for -min-sep and -max-sep are 0 and 200, respectively. Please note, there should be only one input file and it should contain both ends. Two ends are concatenated into one read sequence, i.e., reads of ends width 36nt should be 72nt in the pe_reads.fa file.

$ rmappe -min-sep 200 -max-sep 600 -o mapped_pe_locations.bed -c chromosomes_dir pe_reads.fa


7. To map bisulfite-treated reads (there is no need to convert Cs to Ts in reads or the reference genome). Please note rmapbs can only map single-end bisulfite-treated reads.

$ rmapbs -o mapped_bs_locations.bed -c chromosomes_dir bs_reads.fa


More Information

Questions or comments

Andrew D Smith
andrewds@usc.edu
http://www.cmb.usc.edu/people/andrewds

Wen-Yu Chung
chung@cshl.edu

Citations

Smith AD, Chung WY, Hodges E, Kendall J, Hannon G, Hicks J, Xuan Z and Zhang MQ (2009) Updates to the RMAP short-read mapping software. Bioinformatics 25(21):2841-2842.

Smith AD, Xuan Z, and Zhang MQ (2008) Using quality scores and longer reads improves accuracy of solexa read mapping. BMC Bioinformatics, 9:128.


For older version of rmap, please see RMAP: A program for mapping Solexa reads