Tuesday, May 24, 2016

Inconstant lines

If you order chemicals, then the supplier provides a certificate of analysis, which shows the amounts of impurities or their limit of detection.  Fir physics experiments, one can purchase components which have been carefully cast or machined to precise dimensions. Barring errors by the manufacturers, these reagents and components can be relied upon, as their consistency is known.  Alas, for biological systems, such constancy is often a mirage.

When I was a rotation student in the late Bill Gelbart's lab, he kept two different populations of the "standard" Drosophila strain Oregon-R, each obtained from a different source.  I don't remember what was different about them, but Bill (who was fanatical about his fly stocks) knew of a difference.  Such differences over time are probably unavoidable, as these are biological systems with very good -- but still imperfect -- replication.

Later in my career, I was at Millennium and much of the company's efforts was devoted to understanding the mechanism(s) of action of Velcade in multiple myeloma.  The drug targets the proteasome, but as to downstream effects, there were (and I think still are) multiple competing (and perhaps not strictly mutually exclusive) theories.  Many attempts were made to study cell lines.  An important property of a cell line is whether it grows adherent to the plasticware or simply as cells in suspension.  Depending on your assays, one is often preferable to the other, and so a corollary to Murphy's Law is that you will desire the opposite property from the cell lines you have available.  All of the available cell lines grew in suspension, and the desire was for an adherent line, so these lines were searched for rare adherent cells.  Success was achieved, and for several years great effort was taken to characterize the effects of Velcade using this adherent line.  Just before I was shed, another one of the genomics-focused groups decided to profile all the cell lines which were in use using Affymetrix SNP chips.  Sadly, this revealed the "adherent myleoma" line to be none other than HCT116, an adherent line -- but from colon cancer.  So all the work on that line was garbage from the point-of-view of understanding myeloma. 

This is a recurring story in biology -- because once there were no methods to rapidly verify the specific identity of cells or organisms, we rely too often on careful technique. The most notorious mistaken cell line is HeLa, which has tainted many publications by masquerading as something different, but the problem of mislabeled cell lines goes far beyond HeLa, as my story above illustrates.  Even once rapid methods were developed, such as using forensic-style VNTR typing, far too few labs started using these routinely (they certainly predated the "adherent myeloma line" fiasco).

A recent paper illustrates this problem yet again, but this time at the scale of entire mice.  The inbred mouse line C57BL/6 is heavily used in research.  A research group had generated a knockout affecting sialic acid metabolism and backcrossed it ten generations into C57BL/6.  This strain exhibited altered B-cell development. Further backcrossing into C57Bl/6, however, eliminated the phenotype.  Generating the knockout again in C57BL/6 generated mice with the correct biochemical defect, but no B-cell developmental defects.

 Given these strange results, in which C57BL/6 didn't seem to be equivalent to C57BL/6, the group decided to investigate - by both SNP mapping and whole genome sequencing.  Strikingly, they found a segmental duplication which duplicated two exons in the gene Dock2.  Digging in the literature, they found another report of such a Dock2 duplication in a C57BL/6 model. That group had used SNP mapping (alas, not described in detail -- presumably an array) to identify a similar Dock2 duplication -- and Dock2 is important in B-cell signalling.

Given the unlikelihood of the exact same mutation occurring independently in two embryonic cell lines (used for generating the knockouts), the researchers started scrutinizing the C57BL/6 stocks used for backcrossing.  This identified on supplier, formerly known as Harlan Sprague Laboratories and now Envigo Biosciences, as having C57BL/6 stocks which were homozygous for the defective Dock2-- whereas other suppliers (and indeed, the reference C57BL/6 sequence generated by the Sanger Institute) were homozygous for wildtype Dock2..  As the authors state, this raises the awful shadow of 
concerns that many other lines of gene-targeted mice bearing hematopoietic phenotypes may have been inadvertently compromised by backcrosses involving the use of C57BL/6NHsd mice
If it could happen once, it certainly could happen multiple times.  Will the knockout mouse community start demanding Certificates of Analysis for their mouse shipments, consisting of SNP profiles showing that the delivered mice really are what they are claimed to be?  What more subtle changes are hiding amongst commonly used mouse lines?  Should exome sequencing or whole genome sequencing be run on every breeding stock of every major breeding stock?  The alternative is for confounding mutations to go undetected for extended periods, leading to false results and wasted efforts.  Sequencing ain't free, but irreproducibility isn't cheap either.

No comments: