Wednesday, December 14, 2016

Let's Stop Talking Consensus Accuracy

A common approach for comparing sequencing platforms and assemblies is to report the consensus accuracy, just as for the platforms themselves the raw read accuracy is often reported.  I'm going to go on record with my opinion that stating these as "99.9% accurate" is a terrible habit which must be kicked, as it interferes with proper comparisons.
Before anyone thinks I've completely lost my mind, I'm not arguing we shouldn't compare the quality of assemblies, as that would certainly be an absurd position.  What I take issue with is reporting these in terms of accuracies such as "99.9%" or "99.995%".

The problem with these numbers is they really aren't intuitively useful.  What is the real difference between "99.8%" and "99.9%" accuracy?  Well, to do this right you end up performing some mental math.  It's sobering to do so and be reminded that 99.4% might be great for cleaning your body, but on a 4.6Mb E.coli genome that's almost 28,000 errors!  What I would propose is that we shortcut that calculation by reporting the more useful numbers up front.

So report error rates or intervals, not accuracy.  Of course I know that 1 - accuracy = error rate, but if everyone is doing that calculation anyways why report it in the "wrong" units?  This should have been all-but-settled years ago with phred scores, which are a compact representation -- a phred score is -10 times the log10 of error probability.  But somewhere in NGS platform land, the vendors decided to switch to the accuracy numbers, I fear because they thought they could dazzle people not familiar with the system.

But even phred scores aren't ideal, because almost always are mentally converted to a more useful measure: bases per error.  In other words, on average, what is the interval of correct sequence before hitting an error (we're ignoring here any clustering of errors or other non-randomness).  So a phred score of 30 translates to about 1,000 bases per error whereas 40 is 10,000 per error.

This is valuable because distances on a DNA are something with biological relevance; we're in an appropriate length scale.  For example, many bacterial genes are less than 400 amino acids long, or 1200 codons.  So an assembly with 1,000 bases per error is going to mean an awful lot of genes will have an error.  Go up a level in scale: the gene cluster for the immunosuppressant rapamycin is around 90Kb, so if you have an error every 10,000 bases then we expect 9 errors in the cluster sequence.  If you want that cluster nearly perfect, you'll need to shoot for about phred 50.  Of course, this scales to complete genomes; on a 10Mb actinomycete genome even phred 60 translates to an expectation of 10 errors.  Knowing quality is important when asking scientific questions; sometimes a 1 in 500 error rate genome is good enough, but other times you'll really want to nail down a gene or region to near perfection.  Thinking in error interval space, in my opinion, is much more likely to drive you to an appropriate choice than thinking in terms of accuracy.

So please, if you are reporting consensus quality, report the error frequency.  It's a far more intuitive scale for assessing whether an approach or assembly is good enough for the biological questions you wish to ask.  It's not that accuracy is unimportant, it's that it is so important that we must be very careful in how we think about it.

1 comment:

homolog.us said...

Good point.

Would you please point out which discussion you are criticizing? I am missing the context for this blog post.