The recent PHROGs database from Terzian et al is a great resource for phage annotation. Previously we re-formatted this database into HMMs that are suitable for use within Prokka (read about it HERE and download the HMMs for yourself HERE).
Ryan has added this resource to our INPHARED dataset to re-annotate the genomes of all cultured phages that we can identify in Genbank. The updated GenomesDB folder of INPHARED can be downloaded from here (warning it’s a big file tar file), with > 19,000 genomes now annotated in a consistent manner. We have found the PHROGs annotation really useful to find homologues by string searching based on annotations, due to the standardised annotation provided by the PHROGs team.
These annotations are fully automated, thus for those that have spent 100s of hours annotating one phage, these annotations are most likely not “better” annotations. But they are entirely consistent over all the phages we have re-annotated, which for the analysis we are interested in doing is of importance to us. Ryan has more specific details on how to update the database on his github page. The PHROGs team provide a brilliant interactive site to explore all the PHROGs they annotated here.
Removal of incomplete phage genomes
Thanks to Evelien who has identified several 100 incomplete phages in the database, these have been removed and added to the exclusion list. Full details of those excluded on github page, with the ability to add accessions of other phages that you might spot here, which will be excluded in versions going forward.