Submitting a Bacteriophage Genome to ENA – Part 2

5. Preparing the Files

Genome assemblies can only be submitted through the Webin Command Line Submission Interface (CLI). As bacteriophage genomes generally assemble into 1 contig, you should submit your assembly as a chromosome. The files requiredfor a chromosome assembly submission are as follows:

  • 1 x manifest file – this contains the metadata about the sequencing run and assembly needed for the submission,
  • 1 x flat file – this is the “.embl” file containing your assembled genome sequence and associated annotations,
  • 1 x chromosome list file – this is required to let the submission service know which sequences belong to which chromosomes (in the case of phage genomes, there is only one chromosome but it is still required).

5.1 Manifest File

The manifest file is a tab-separated file consisting of two columns; field name and field value. Below is a table of the required fields:

Field NameField Value
STUDYStudy accession number (beginning with PRJEB).
SAMPLESample accession number (beginning with ERS).
ASSEMBLYNAMEUnique name of the assembly.
(Must not include the name of the organism).
ASSEMBLY_TYPE“clone or isolate”
(Enter this exactly as in the quote above,
not “clone” or “isolate”).
COVERAGEEstimated depth of sequencing coverage.
PROGRAMProgram used to assemble the genome.
PLATFORMPlatform used to sequence the genome.
FLATFILEThe name of the “.embl” file containing the
genome and annotations.
CHROMOSOME_LISTThe name of the chromosome list file.

Here is an example of a manifest file for a bacteriophage genome submission:

STUDY   PRJEB32519
SAMPLE  ERS3527802
ASSEMBLYNAME    Assembly 1
ASSEMBLY_TYPE   clone or isolate
COVERAGE        426
PROGRAM SPAdes
PLATFORM        Illumina MiSeq
FLATFILE        vB_Eco_SLUR29_genome.embl.gz
CHROMOSOME_LIST vB_Eco_SLUR29_genome.cl.gz

5.2 Chromosome List File

As mentioned above, most bacteriophage genomes will assemble into one contig. However, for the purposes of submission, bacteriophage genome assemblies should be submitted as a chromosome. This makes it clear that the genome is complete with no gaps in the sequence. Thus, the chromosome list file should contain one, tab-separated line, as below:

chr01   1       Chromosome

5.3 EMBL-Bank Flat File

6. Submitting to the ENA

As previously mentioned, genome assemblies must be submitted through the Webin Command Line Interface. The latest version can be downloaded from the following link:

https://github.com/enasequence/webin-cli/releases

Prior to attempting to submit your files, you can perform a validation to ensure the files are formatted correctly using the “-validate” option of the Webin-CLI program:

java -jar webin-cli-<version>.jar \
-context genome \
-userName Webin-<XXXXX> \
-password <password> \
-manifest <manifest_file> \
-validate

The error messages returned by the validation option can be difficult to interpret and fix. However, if your files have been formatted as described above there shouldn’t be too many issues.

Once the files have been validated, the files can be submitted using the “-submit” option of the Webin-CLI program:

java -jar webin-cli-<version>.jar \
-context genome \
-userName Webin-<XXXXX> \
-password <password> \
-manifest <manifest_file> \
-submit