Bacteriophage Genome assembly

Recently, we published a paper in PeerJ titled “Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomes “. The work on this problem started several years ago, when I was actively encouraged (forced) to write a blog post on the Warwick Microbiology and Infection Unit blog (http://blogs.warwick.ac.uk/microbialunderground/). At the time, I looked at some basic parameters that influenced the assembly of bacterial genomes and showed how above a certain coverage there was no benefit in increasing sequencing depth alone. 

Given my interested in bacteriophage genomes and some work we were doing on sequencing multiple phage genomes, I started to look at the amount of sequence coverage required to successfully assemble bacteriophage genomes.  Combined with a throwaway comment on a grant review I received saying that  “it won’t work anyway, as most phage genomes cannot be assembled”, I had enough motivation to start looking at this in far more detail, largely out of frustration and the wish to prove the reviewer wrong.

Using in silico modelling of ~2000 complete phage genomes from ENA (some later got removed as the initial assemblies contained too many ambiguous bases), we investigated how increasing the depth of coverage and insert size affected the assembly of phage genomes when using SPAdes as an assembly algorithm. We demonstrated that the majority of bacteriophage genomes could be assembled without errors and went on to test this in vitro, by the sequencing of some novel phage genomes.

The resulting paper was submitted to PeerJ, where I experienced, for the first time, open peer review, as two of the four reviewers signed their reviews. An interesting observation from the review was that the two signed reviews provided the most useful critique by far. The feedback from the reviewers undoubtedly improved the manuscript by requesting that we further test our hypothesis using additional two assemblyalgorithms (Ray & Velvet), using a variety of sequencing depths and insert sizes.

As a result, after completing ~ 75,000 phage assemblies with a range of insert sizes, sequencing depths and assembly algorithms,we demonstrated that:

  • The majority of phage genomes (>98 %) can be completely assembled at 30x coverage
  • Multiple phages can be combined in a single sequencing library
  • the sequenced phage genomes from combined libraries can be completely reconstructed.

For full details, check out the paper:
Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomes.
Rihtman B, Meaden S, Clokie MR, Koskella B, Millard AD.
PeerJ. 2016 Jun 1;4:e2055. doi: 10.7717/peerj.2055. eCollection 2016.