Wed, Oct 30, 2013 – plotting, fitting, RNAseq, and mapping

HW overview

Let’s go over the homework briefly.


I have a full de novo mRNAseq pipeline here:

it involves many of the same steps that you have already run in other guises: filter and trim reads, normalize, and assemble. It also includes calling gene families, annotating (naming the sequences something), and doing differential expression calculation.

It takes about a week so I’m not going to ask you to run it :).

However, for the homework, one option will be to do some differential expression analysis, so I thought I’d show you around the data.

Data loading, plotting, fitting

See notebook at:


Note, you can explore functioning code from the matplotlib gallery by using %loadpy inside of IPython Notebook – for example, put


into a notebook and execute the cell.

So, the point of all this Python stuff is that at some point you may want to investigate your data hand! gasp In that case you will need to know, minimally, how to load your data in and poke at it.

Mapping and variant calling

While we don’t have a general protocol for variant calling, it’s in some ways easier. Basically, you “map” the reads to a reference, and then look to see where they differ.

We’re going to be looking at data from one of Lenski’s experiments: Genomic analysis of a key innovation in an experimental E. coli population.

Genome of the reference strain here.

Tablet viewer