Part 1: How the conservation data was obtained
These links will be used to help illustrate certain concepts. I'll tell you when to click on each.
Click here first to reset your UCSC Genome Browser preferences,
then hit the back button on your web browser.
Consult these references for more details about the algorithms,
statistics and programs underlying conservation in the UCSC Genome
Browser:
Part 2: Finding conserved non-coding sequences
These exercises will get you started using the UCSC Genome Browser to
identify conserved non-coding sequences that are good candidates for
containing functional regulatory elements. You can follow each of the
steps below exactly, or explore a little yourself. The links in each
section below will help you get back on track if you explore too much.
The main example will examine the human albumin (ALB) and
alpha-fetoprotein (AFP) genes, but you should be able to follow
analogous steps to look at any other gene. Open the links on this page
in another web browser window or tab so you can easily refer back to
this page when needed.
Getting started
These steps will get you started, and let you look at the organization
of the ALB and AFP genes.
- Human Genome Browser Gateway This link will reset your UCSC Genome Browser history, so
you might want to click here again if you change your browser settings
and need to return to the default settings.
- Search results for ALB.
Fortunately the one we want is top on the list.
- Human
albumin in the Genome Browser with the default tracks. Notice the
conservation track is showing the highest peaks under the exons, but
some other high peaks in introns.
- Zoom
out ten times to see AFP just downstream of ALB.
- Clear away ("hide") some unneeded tracks (RefSeq Genes, SNPs,
Human mRNAs, Spliced ESTs, MGC Genes and STS Markers) and add ("pack")
the Most Conserved track. Here's a shortcut.
- Zoom in on the ALB promoter.
There is lots of conservation. And zoom back out
(zooming should be easy).
Conserved non-coding elements
Now we will use the Table Browser tools to exclude those
conserved elements that are in exons. In practice we would not want to
exclude elements that lie in UTRs, because many important regulatory
elements also exist in UTRs (check your understanding: why are
regulatory elements more common in UTRs than other parts of
exons?). You can start here.
- Click the Tables link at the top of the page (in the blue
navigation bar).
- Select group: Comparative Genomics and track:
Most Conserved. The only option in the table menu will
be phastConsElements17way.
- Click summary/statistics to check out the number of
elements in the region we are examining, along with information about
scores for those elements. Notice that these scores are different from
the LOD scores shown in the browser beside each of the most conserved
elements. Hit the back button in your web browser to get back to the
main Table Browser page.
- Click the create button beside intersection. You
will be taken to a new page, where you can "intersect" the data from
this track with some other track.
- Select group: mRNA and EST Tracks and track: Human
mRNAs and table: Human mRNAs (all_mrna).
- Select the radio button for All Most Conserved
records that have no overlap with Human mRNAs, and then click
Submit. You will be taken back to the Table Browser main page,
and you will see intersection with all_mrna where intersection
used to be.
- Click summary/statistics again to get a summary
of the changes. Notice that there are now far fewer elements.
Hit the back button on your web browser to return to the Table Browser
main page.
- Select output format: custom track, then click the get output button.
- You will be taken to a page that allows you to name the custom track.
In the box beside name= enter conserved_non_coding,
and enter something like
PhastCons most conserved elements, excluding those in mRNAs
in the box beside description=.
- Click get custom track in genome browser to see the
conserved non-coding sequences in the browser.
You should end up with something like this.
Conserved non-coding, non-repeat elements
Now we are going to further refine our custom track to eliminate
repeat elements. While these are less likely to be under selection,
we want to make sure they are not in our data set. Click here
to start.
- Click the Tables link (top of page) again.
- Select group: all tracks and track: conserved_non_coding and
table: conserved_non_coding.
- Click the create button beside intersection.
- Select group: Variation and Repeats and track: RepeatMasker
and table: RepeatMasker (rmsk).
- Select the radio button for All conserved_non_coding that have no overlap with RepeatMasker,
and then click the submit button. You will be taken back to the Table Browser main page,
and you will see intersection with rmsk where intersection used to be.
- Select output format: custom track, then click the get output button.
- You will be taken to a page that allows you to name this new custom track.
In the box beside name= enter conserved_non_coding_non_repeat,
and enter something like
PhastCons most conserved elements, excluding those in mRNAs or repeats
in the box beside description=.
- Click get custom track in genome browser to see the
conserved non-coding non-repeat sequences in the browser.
- Select hide for the conserved_non_coding custom
track (we don't need it anymore). Then click the refresh button.
You should end up with something like this.
Try to zoom in on the promoters of the ALB and AFP genes.
Filter for highest scoring
Now we are going to apply a filter to select only the highest scoring
of the most conserved elements. You can click here to start.
- Click the Tables link (top of page) again.
- Select group: all tracks and track: conserved_non_coding and
table: conserved_non_coding_non_repeat.
- Click the create button beside filter.
You will be taken to a new page, where you can "filter" the data from
this track according to some associated values.
- Select ">=" beside score and enter 300 in
the corresponding box. Then click the submit button.
- Click summary/statistics to check out the number of
elements remaining after filtering on the score. Hit the back button
in your web browser to get back to the main Table Browser page.
- Select output format: custom track, then click the get output button.
- Click get custom track in genome browser to see
where the remaining high-scoring elements are in the browser.
You should end up with something like this.
There are 6 remaining elements, each upstream of the ALB transcription
start site. These would be good places to start looking for new
regulatory elements.
The known regulatory elements for ALB and AFP
Now we will look at some data on verified functional regulatory
elements that regulate transcription of ALB and AFP. You can click
here
to start.
- Download this file.
Data for two user-defined tracks is in this file, including the 6 "most conserved" elements
that remained after the filtering step we just did. I put those in the same
file as the new track because the browser currently has problems
maintaining user-defined/custom tracks from multiple sources. This
file is in the extremely simple BED format, in which
the minimum information for each data point is just the chromosome,
start and end of the sequence.
- Click the custom tracks button and you will be taken to a
page containing a form where you can either paste in track data or
upload it from a file.
- Click the Browse... button, and select the file you just downloaded.
Then click the Submit button.
You should end up with something like this.
The transcription factors whose sites are visible are mostly
well-known liver regulators, and both ALB and AFP have liver-specific
functions.
Part 3: Some interesting examples
Gill Bejerano's ultraconserved elements
Bejerano
et al. (2004) described the "ultraconserved elements", which are
perfectly conserved (no gaps or substitutions) across many
species. Dr. Bejerano's home page has a link to more information
about these, and a custom track for these elements (in hg17) can be
downloaded here.
See also this
paper for Dr. Bejerano's very interesting recent work.
Adam Woolfe's highly-conserved trans-dev regulatory regions
Woolfe et al. (2004) identified nearly 1,400 highly
conserved non-coding sequences by comparing human and Fugu. A custom
track for these regions (mapped to the hg18 genome) is available here. Check to
see if any are near genes involved in your own research.
Some of my stuff
The DME algorithm
CREAD
Questions or comments? Contact Andrew Smith.
Last updated: Sunday September 24, 2006