Supplementing and Colouring vConTACT2 Clusters

Supplementing vConTACT2 Clusters

vConTACT2, available here, is a fantastic tool for inferring the taxonomy of viral sequences. This pipeline clusters the user’s sequences with a RefSeq database based upon shared protein clusters.

As the RefSeq database contains relatively few genomes, more context can be given to a network by adding more reference genomes.

Phage genomes are collected each month and made publicly available here. Their genbank accessions are then used to extract useful information, which we (@CyanoNey) recently used to produce useful annotation files for phage phylogenetic trees, including viral family and bacterial host, available here.

Proteins were predicted on 12,892 phage genomes (30/05/2020) using Prodigal and input files for vConTACT2 were produced.

Simply combine these protein sequence fasta and gene-to-genome mapping files with your own, and they will be added to the network:

(Warning: files are quite large)

Quickly Identifying Reference Genera

To quickly identify clusters of reference genomes at the genus, we (@RyanCookAMR) have produced a mapping file that will colour in these reference sequences:

To use the annotation file:

  1. Load your network into Cytoscape
  2. File -> Import -> Table from File…
  3. Then move to the “Style” portion of the control panel
  4. Fill Colour -> Mapping (middle box) -> Column = Colour_Hex -> Mapping = Passthrough Mapping

And the resultant network will  look something like this:

Alternatively, you can choose your own colours for specific genera of interest by selecting Column = Subfamily/Genera -> Mapping = Discrete Mapping, and then manually selecting colours.

Happy clustering!