Beginner Guide to Bacteriophage Genome Assembly 2. Files and workflow

Files and Workflow

What files do you start off with?

The sequencing used on my DNA sample was paired sequencing, so for one DNA sample I have two files, each sequenced from a different direction.

Read more: Different types of sequencing.

~/Documents/SLUR200/R1.fq and ~/Documents/SLUR200/R2.fq
       This is the location on the server I am using of my two fastq files. Read the beginner bash website on my previous page to understand why the location of the files in the server is as important as the names of the files.

These two files are the raw data output from a sequencer using paired sequencing for one DNA extracted bacteriophage sample, called SLUR200. The sequencer gives fastq (.fq) files as the output. The DNA was fragmented before sequencing, and each fragment is sequenced individually. Every single DNA fragment gives one sequence, and both .fq files are just massive lists of fastq sequences for single DNA strands. Each sequence has four lines of text:

  1. Begins with @ symbol, contains the identifier for the sequencer and DNA fregment
  2. DNA sequence
  3. + symbol
  4. Quality score for each base in the sequence

https://en.wikipedia.org/wiki/FASTQ_format

What is the workflow?

-Trimming Fastq files with Sickle
-Creating a contig assembly with Spades
-Visualising the contigs with Bandage
-Mapping sequencing reads onto contig assembly with BWA
-Converting output to .sam file with Samtools

 

Next | Previous