See our publication in PHAGE to read about how this dataset is produced and some of our analyses of it. Please consider citing this paper if you are using this database of information on this webpage. You can also generate an up-to-date version of the database, with useful files for vConTACT2, MASH, and IToL using our Perl script available on Github. Updates to the script this month include a new column to the tsv outputs which include anything identified as “host” or “lab_host” within the original Genbank files. However, these values may be inconsistent or downright bizarre (so please use them with caution).
We also recently added annotations using PHROGs (more details available here), and you can download the updated annotations from HERE (please note that we won’t be re-uploading the updated annotations on a monthly basis, as the file is huge. Having the first ~26,000 already annotated will save users a lot of time when using the Perl script themselves).
If you don’t want to run the script yourself, please download all of the files ready-made from below: