Sunday, May 14, 2017

SFAF & I'm Not Dead Yet Technologies

Jonathan Jacobs posted his annual reminder that the Sequencing, Finishing and Analysis in the Future Meeting (SFAF) will be this week.  Alas, that meeting hasn't had many more tweeters in the past than Jonathan, but perhaps this year there will be more.  There's a glut of genomics conferences to track, compile tweets and opine on -- besides London Calling, there's been SMRT Leiden and Biology of Genomes, all in the span of two weeks!  This post is going to be a bit short on actual writing and more to just flag some talks at SFAF that grabbed my attention.  What I realized is that the talks at SFAF illustrate that a number of technologies I consider effectively dead retain significant attention.


I'll admit that I tend to write technologies off after a certain point; once I've decided that a better tech exists I expect everyone else to see it the same way.  But that isn't reasonable.  Sometimes I'm just plain premature in my assessment, and other times there may be a variety of factors which preserve a tech.  For example, once a large pipeline is set up to handle a certain type of data, there is a significant switching cost (both actual and psychological) to changing to a different technology.

Fluorescent Sanger sequencing is not a technology I have fully written off, but one I see as being confined to a few niches.  One of those is sequencing in small batches that just aren't cost effective for any newer technologies due to library construction costs (artisanal sequencing?).  Have one small construct to validate?  Sanger is likely the route.

I'm not involved in clinical genomics, but clearly there is significant debate over the utility of validating by Sanger variants found by Illumina or other parallel sequencing approaches. The anti-Sanger argument is that sufficient depth of coverage ensures good calls.  For the pro-Sanger argument I've seen several justifications.  First, it is largely orthogonal in nature and so can scotch false positives caused by various flavors of Illumina artifacts.  Second, by running a completely separate analysis on a different patient sample the risk of sample swaps can be mitigated.  The genomics group at Baylor College of Medicine has a talk on large-scale primer design for variant validation.

Mate-pair technology really is behind my eight ball, as it never really solved any problems for me.  We tried it a few times with very modest success.  Whatever my personal feelings, mate pairs are, in my opinion, simply being erased in value by long read technologies.  It is certainly the case in the small genome world that for similar input material and dollar cost, a long read library can be generated which will match or exceed the results from a conventional mate pair library.

For larger genomes or metagenomes, a mate pair library may still be able to gain greater physical coverage than long reads on PacBio or nanopore.  But in that space, linked reads from 10x Genomics or optical maps or HiC methods are far more likely in my view to deliver significant improvements in scaffolding than the typical mate pairs.  Even with long read technologies, if $1K can buy you 8X coverage or so of a human-class genome with 20kb+ reads, that's likely to have a significant impact on contig N50 of a short read assembly.  I guess that's another bias on my part: scaffolds are nice but contigs are more important.  You may feel differently.  In any case, an abstract on insect genomes that has an interesting bit about sequencing museum specimens inexpensively also talks quite a lot about mate pair construction.  Another abstract discusses new algorithms for scaffolding.

As noted above, BioNano optical maps are currently a good way to generate long-range information for de novo assemblies.  The conference features several talks using Nabsys 2.0's electronic mapping technology.   In the microbial genome space I am skeptical of the value of mapping technologies, as long read sequencing has become so routine and sequences will always outclass maps for biological value.  For the counter argument, Nabsys has a talk on using their technology for Bordtella genomes and someone from the Centers for Disease Control will be speaking on their experience with optical mapping with over 2000 maps generated.

Another talk illustrating a tug-of-war between technologies is one trying to optimize primer design for pathogen detection.  A very valuable goal, but given the plummeting cost of metagenomic sequencing methods, any such PCR-based method would need to be tested head-to-head against the sequencing methods to determine speed-to-result and sensitivity/specificity.  Perhaps the day of sequencing only isn't here yet, but that day shouldn't be many years off.

There's a lot of other interesting sounding talks at SFAF, both from a technology standpoint and biology. My rough notes on SFAF abstract book are below, but they are just a sampling of the abstract titles.  I'll probably generate storify pages of the tweets from the conference, though I won't fight anyone for that glory :-)

Notes

SFAF has sponsorship from many vendors, who get slots.  Not only the big players like Illumina, but also companies such as Nabsys.

Full abstract book is online, as well as a hyperlinked program.

Susan Dutcher talk on ciliopathies -- if I remember correctly, she once worked on Chlamydomonas, which I worked with as an undergraduate



The BD CLiC – a fully integrated miniaturized library prep system enabling PCR-free whole genome library prep --  I saw CLiC at Marco Island back when it was still a separate Irish company.  Interesting fluidics technology, not quite microfluidics (1ul droplets).  





Sequencing the largest existing collection of historic commercial solventogenic clostridia strains to dissect industrial acetone-butanol-ethanol (ABE) fermentations -- historical note, it was such strains that kept Britain in WWI by enabling the production of key explosives


De novo assembly of complete chloroplast genomes   -- short read assemblies.  Single nanopore read chloroplast genomes are certainly plausible, but that's part of an embryonic post.

MinION Nanopore Sequencer for Human Identification or Sample Source Attribution --  nanopore talk, from Rachel Spurbeck at Battelle.  Also Winston Timp is giving Assembly and Analysis of Concurrent XDR and HMV K. pneumo Substrains Using Nanopore Sequencing -- I think last year the conference had no nanopore talks.

Next Generation Sequencing: Signature Sequence Detection For In Silico Primer Design -- always a battle between just sequencing-the-heck out of things and trying to use clever primer design to enrich.  Here's the latest on clever primer design -- from Noblis.


SPAdes Family of Tools for Genome Assembly and Analysis: What's New? -- but the other SPAdes group talk has a better title -- SPAdes: is there anything new we could develop?  -- that talk covers dealing with hairy repeats particularly those in non-ribosomal peptide synthetases and polyketide synthases (very near and dear to my mission)






Zika virus, drug discovery, and student projects in bioinformatics -- really cool sounding curriculum from Sandra Porter

A Method to Streamline and Miniaturize Library Preparation for Next-Gen Sequencing Using the Labcyte Echo® 525 Liquid Handler -- one of several talks from Labcyte on the Echo -- I sooooo wish these machines were more affordable, as they are utterly cool.

I'm not dead yet technology talk #1: Primer Design Pipeline for Large Scale Sanger Validation

I'm not dead yet technology talk #2: Sequencing insect genomes on a budget -- abstract has a lot of talk of mate pairs -- but also ~$200/sample to prepare & sequence butterflies from museum collections

I'm not dead yet technology talk #3: A Global Optimization Approach for Scaffolding and Completing Genome Assemblies -- scaffolding using mate pairs



Bacterial genome reduction as a result of short read sequence assembly -- cautionary talk on how short read assemblies can lose important information


Assembly of heterozygous genomes -- yet another de Bruijn assembler (Bwise), claiming to better leverage paired end reads and other long range information and claiming to assemble human with only 50GB of RAM.  






1 comment:

Jonathan Jacobs said...

Thanks for the props Keith! I would have missed this post if it weren't for one of my staff members pointing me to it. HA!

SFAF is an awesome conference. Great mix of deep dive technical computational biology talks, high level "real world use" key notes, and tons of Gov. PM's making an appearance (mainly because the conference is Free) to learn more about genomics and looking for new tech to fund/support.