Part 1: How the conservation data was obtained

These links will be used to help illustrate certain concepts. I'll tell you when to click on each. Click here first to reset your UCSC Genome Browser preferences, then hit the back button on your web browser. Consult these references for more details about the algorithms, statistics and programs underlying conservation in the UCSC Genome Browser:

Part 2: Finding conserved non-coding sequences

These exercises will get you started using the UCSC Genome Browser to identify conserved non-coding sequences that are good candidates for containing functional regulatory elements. You can follow each of the steps below exactly, or explore a little yourself. The links in each section below will help you get back on track if you explore too much.

The main example will examine the human albumin (ALB) and alpha-fetoprotein (AFP) genes, but you should be able to follow analogous steps to look at any other gene. Open the links on this page in another web browser window or tab so you can easily refer back to this page when needed.

Getting started

These steps will get you started, and let you look at the organization of the ALB and AFP genes.

Conserved non-coding elements

Now we will use the Table Browser tools to exclude those conserved elements that are in exons. In practice we would not want to exclude elements that lie in UTRs, because many important regulatory elements also exist in UTRs (check your understanding: why are regulatory elements more common in UTRs than other parts of exons?). You can start here.

  1. Click the Tables link at the top of the page (in the blue navigation bar).
  2. Select group: Comparative Genomics and track: Most Conserved. The only option in the table menu will be phastConsElements17way.
  3. Click summary/statistics to check out the number of elements in the region we are examining, along with information about scores for those elements. Notice that these scores are different from the LOD scores shown in the browser beside each of the most conserved elements. Hit the back button in your web browser to get back to the main Table Browser page.
  4. Click the create button beside intersection. You will be taken to a new page, where you can "intersect" the data from this track with some other track.
  5. Select group: mRNA and EST Tracks and track: Human mRNAs and table: Human mRNAs (all_mrna).
  6. Select the radio button for All Most Conserved records that have no overlap with Human mRNAs, and then click Submit. You will be taken back to the Table Browser main page, and you will see intersection with all_mrna where intersection used to be.
  7. Click summary/statistics again to get a summary of the changes. Notice that there are now far fewer elements. Hit the back button on your web browser to return to the Table Browser main page.
  8. Select output format: custom track, then click the get output button.
  9. You will be taken to a page that allows you to name the custom track. In the box beside name= enter conserved_non_coding, and enter something like PhastCons most conserved elements, excluding those in mRNAs in the box beside description=.
  10. Click get custom track in genome browser to see the conserved non-coding sequences in the browser.

You should end up with something like this.

Conserved non-coding, non-repeat elements

Now we are going to further refine our custom track to eliminate repeat elements. While these are less likely to be under selection, we want to make sure they are not in our data set. Click here to start.

  1. Click the Tables link (top of page) again.
  2. Select group: all tracks and track: conserved_non_coding and table: conserved_non_coding.
  3. Click the create button beside intersection.
  4. Select group: Variation and Repeats and track: RepeatMasker and table: RepeatMasker (rmsk).
  5. Select the radio button for All conserved_non_coding that have no overlap with RepeatMasker, and then click the submit button. You will be taken back to the Table Browser main page, and you will see intersection with rmsk where intersection used to be.
  6. Select output format: custom track, then click the get output button.
  7. You will be taken to a page that allows you to name this new custom track. In the box beside name= enter conserved_non_coding_non_repeat, and enter something like PhastCons most conserved elements, excluding those in mRNAs or repeats in the box beside description=.
  8. Click get custom track in genome browser to see the conserved non-coding non-repeat sequences in the browser.
  9. Select hide for the conserved_non_coding custom track (we don't need it anymore). Then click the refresh button.

You should end up with something like this. Try to zoom in on the promoters of the ALB and AFP genes.

Filter for highest scoring

Now we are going to apply a filter to select only the highest scoring of the most conserved elements. You can click here to start.

  1. Click the Tables link (top of page) again.
  2. Select group: all tracks and track: conserved_non_coding and table: conserved_non_coding_non_repeat.
  3. Click the create button beside filter. You will be taken to a new page, where you can "filter" the data from this track according to some associated values.
  4. Select ">=" beside score and enter 300 in the corresponding box. Then click the submit button.
  5. Click summary/statistics to check out the number of elements remaining after filtering on the score. Hit the back button in your web browser to get back to the main Table Browser page.
  6. Select output format: custom track, then click the get output button.
  7. Click get custom track in genome browser to see where the remaining high-scoring elements are in the browser.

You should end up with something like this. There are 6 remaining elements, each upstream of the ALB transcription start site. These would be good places to start looking for new regulatory elements.

The known regulatory elements for ALB and AFP

Now we will look at some data on verified functional regulatory elements that regulate transcription of ALB and AFP. You can click here to start.

  1. Download this file. Data for two user-defined tracks is in this file, including the 6 "most conserved" elements that remained after the filtering step we just did. I put those in the same file as the new track because the browser currently has problems maintaining user-defined/custom tracks from multiple sources. This file is in the extremely simple BED format, in which the minimum information for each data point is just the chromosome, start and end of the sequence.
  2. Click the custom tracks button and you will be taken to a page containing a form where you can either paste in track data or upload it from a file.
  3. Click the Browse... button, and select the file you just downloaded. Then click the Submit button.

You should end up with something like this. The transcription factors whose sites are visible are mostly well-known liver regulators, and both ALB and AFP have liver-specific functions.

Part 3: Some interesting examples

Gill Bejerano's ultraconserved elements

Bejerano et al. (2004) described the "ultraconserved elements", which are perfectly conserved (no gaps or substitutions) across many species. Dr. Bejerano's home page has a link to more information about these, and a custom track for these elements (in hg17) can be downloaded here.

See also this paper for Dr. Bejerano's very interesting recent work.

Adam Woolfe's highly-conserved trans-dev regulatory regions

Woolfe et al. (2004) identified nearly 1,400 highly conserved non-coding sequences by comparing human and Fugu. A custom track for these regions (mapped to the hg18 genome) is available here. Check to see if any are near genes involved in your own research.

Some of my stuff

The DME algorithm
CREAD

Questions or comments? Contact Andrew Smith. Last updated: