As I've discussed in previous blog posts, I am working in the lab to characterize major histocompatability complex(MHC) genes in yellow-bellied marmots. These genes code for molecules that recognize and bind proteins floating around in and around our cells. The MHC complex distinguishes between self and non-self proteins, and initiates an immune response when a non-self protein (e.g. pathogen) is detected.
MHC proteins must therefore be able to accurately differentiate a wide array of molecules and pathogens specific to the population of interest. These genes are therefore the targets of enormous selective pressure (making them a great model for evolutionary genetics studies) with numerous, ecologically important consequences. An individual's MHC geneotype may determine how it chooses mates, how it responds to parasites, and even how long it lives.So, that all sounds great. But how do we actually determine MHC genotype? Well, we have to sequence the DNA of each individual marmot at this particular locus (i.e. site on the genome). However, when we sequence using traditional Sanger sequencing methods, both copies of the gene (the one you inherited from mom and the one you inherited from dad) are pooled in the same reaction. So, if your alleles at this locus looks like this...
from mom AGATT
from dad AGAAT
...the sequence I'm viewing after Sanger sequencing looks like this...
AGA?T where ? is T and A
If I then asked you (after showing just the Sanger sequencing output), what are the possible sequences for this locus? You could easily reconstruct mom and dad's individual alleles based on simple ideas of segregation. But what if I showed you this sequence...
AGA?T? where ? is T and A...becomes a bit more complicated, right?
There are more than two possible alleles in this case, so was mom AGATTT and dad AGAATA? Or was mom AGATTA and dad AGAATT?
Now imagine that you are working with sequences that are hundreds of base pairs long with 10-20 variable positions. How could you possibly figure out the alleles present in an entire population of marmots?
The answer is cloning. I take my PCR product (containing many copies of both sets of alleles) and stick it into bacteria that only accept a single copy of the gene at a time. I plate and grow these bacteria into colonies (i.e piles and piles of clones containing the same, single allele from my PCR product) and sequence this DNA to determine individual alleles. To ensure that I have sequences of each individual allele, I must grow, pick, and sequence many many colonies (see below for a small fraction of my clone army!)
Conveniently, the bacteria we use are designed with the PCR product insertion site in a gene that causes them to produce a blue color. Therefore, colonies that have accepted the insert (and are now part E. coli part marmot) turn white, while colonies that have not accepted the insert retain their blue color (see below). This makes the clone screening process much more efficient, though it is still a time consuming and labor intensive process.
And in the end, if all goes according to plan, I am left with sequences of individual alleles that can be used in analyses, as well as lots and lots of marmoty bacteria.