5. Preparing the Files
Genome assemblies can only be submitted through the Webin Command Line Submission Interface (CLI). As bacteriophage genomes generally assemble into 1 contig, you should submit your assembly as a chromosome. The files requiredfor a chromosome assembly submission are as follows:
- 1 x manifest file – this contains the metadata about the sequencing run and assembly needed for the submission,
- 1 x flat file – this is the “.embl” file containing your assembled genome sequence and associated annotations,
- 1 x chromosome list file – this is required to let the submission service know which sequences belong to which chromosomes (in the case of phage genomes, there is only one chromosome but it is still required).
5.1 Manifest File
The manifest file is a tab-separated file consisting of two columns; field name and field value. Below is a table of the required fields:
|Field Name||Field Value|
|STUDY||Study accession number (beginning with PRJEB).|
|SAMPLE||Sample accession number (beginning with ERS).|
|ASSEMBLYNAME||Unique name of the assembly.|
(Must not include the name of the organism).
|ASSEMBLY_TYPE||“clone or isolate”|
(Enter this exactly as in the quote above,
not “clone” or “isolate”).
|COVERAGE||Estimated depth of sequencing coverage.|
|PROGRAM||Program used to assemble the genome.|
|PLATFORM||Platform used to sequence the genome.|
|FLATFILE||The name of the “.embl” file containing the|
genome and annotations.
|CHROMOSOME_LIST||The name of the chromosome list file.|
Here is an example of a manifest file for a bacteriophage genome submission:
STUDY PRJEB32519 SAMPLE ERS3527802 ASSEMBLYNAME Assembly 1 ASSEMBLY_TYPE clone or isolate COVERAGE 426 PROGRAM SPAdes PLATFORM Illumina MiSeq FLATFILE vB_Eco_SLUR29_genome.embl.gz CHROMOSOME_LIST vB_Eco_SLUR29_genome.cl.gz
5.2 Chromosome List File
As mentioned above, most bacteriophage genomes will assemble into one contig. However, for the purposes of submission, bacteriophage genome assemblies should be submitted as a chromosome. This makes it clear that the genome is complete with no gaps in the sequence. Thus, the chromosome list file should contain one, tab-separated line, as below:
chr01 1 Chromosome
5.3 EMBL-Bank Flat File
6. Submitting to the ENA
As previously mentioned, genome assemblies must be submitted through the Webin Command Line Interface. The latest version can be downloaded from the following link:
Prior to attempting to submit your files, you can perform a validation to ensure the files are formatted correctly using the “-validate” option of the Webin-CLI program:
java -jar webin-cli-<version>.jar \ -context genome \ -userName Webin-<XXXXX> \ -password <password> \ -manifest <manifest_file> \ -validate
The error messages returned by the validation option can be difficult to interpret and fix. However, if your files have been formatted as described above there shouldn’t be too many issues.
Once the files have been validated, the files can be submitted using the “-submit” option of the Webin-CLI program:
java -jar webin-cli-<version>.jar \ -context genome \ -userName Webin-<XXXXX> \ -password <password> \ -manifest <manifest_file> \ -submit