Skip to content

Commit

Permalink
Revise README with new info
Browse files Browse the repository at this point in the history
  • Loading branch information
rrwick committed Sep 20, 2017
1 parent cec26ea commit a075782
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,8 +234,7 @@ Finally, Nanonet seems a bit dated and should probably be avoided. However, it d

The training set used for any given basecaller may have a huge impact on the quality of the results. [Tim Massingham](https://github.com/tmassingham-ont) (the author of Scrappie) mentioned [here](https://github.com/rrwick/Basecalling-comparison/issues/1) that Albacore v2.0.2 and Scrappie raw v1.1.0 rgrgr_r94 are very similar. The key difference is that Scrappie was trained with a human-only dataset, which possibly explains why Albacore v2.0.2 did much better for my _Klebsiella_ sample.


Methylation may also be an important factor related to training. A methylated base would be expected to produce a different signal in the pore than its unmethylated counterpart, but I have heard (by word-of-mouth, so I could be wrong) that Albacore/Scrappie training sets only use PCRed DNA (which lacks methylation). This could mean that the basecaller is confused by signals resulting from methylated bases, explaining much of the residual error in assemblies. If this is the case, two solutions jump to mind: 1) PCR your DNA before ONT sequencing, or 2) train the basecaller's neural network using native DNA (_with_ methylation). I don't like the first solution (more wet lab work), but the second seems promising. If it really is as simple as using a better training set, then significant basecaller improvements are potentially just around the corner.
Methylation may also be an important factor related to training, as a methylated base would be expected to produce a different signal in the pore than its unmethylated counterpart. If a basecaller was trained with mostly or entirely unmethylated DNA, then we might expect it to give low accuracy on heavily methylated samples. This may explain much of the residual error I saw in my assemblies. If this is the case, two solutions jump to mind: 1) PCR your DNA before ONT sequencing, or 2) train the basecaller's neural network using DNA (_with_ methylation) which is more like your sample. I don't like the first solution (more wet lab work), but the second seems promising. If it really is as simple as using a better training set, then significant basecaller improvements are potentially just around the corner.



Expand Down

0 comments on commit a075782

Please sign in to comment.