Written by: KSE

Last updated: 20220316 (KSE)


<aside> 💡 This protocol provides a detailed description of how to analyze wild isolate sequence data in the Andersen Lab. There are several nextflow pipelines run subsequently, however there are still several manual steps and checks that must be done to ensure proper analysis and an organized data structure that everyone involved can understand and contribute to.

</aside>

Table of contents:


(Optional) Update genome data with new species/project/build

<aside> 💡 NOTE: This step is not necessary for every sequencing analysis, only when we want to change the WS version (i.e. from WS276 to WS280). When you do create a new genome version, it will be necessary to change the defaults in several different pipelines. Also, this process will look different for species where we generate our own genome data like C. briggsae and C. tropicalis

</aside>

Default - use for C. elegans N2 reference or other species references obtained directly from wormbase

  1. Run the genomes-nf pipeline
    1. Create a new folder for your project analysis

    2. Run the pipeline with the following command:

      nextflow run andersenlab/genomes-nf \\
      --projects <species>/<projectID> \\
      --wb_version <WSXXX>
      

      <aside> 💡 NOTE: You can also choose to clone the git repo into your personal folder and run it locally, however we recommend running the pipeline remotely because it allows nextflow to store information about the git branch and commit of the run, allowing for best reproducible results. You can choose to run a specific commit using the -r XXX command, where XXX is the commit ID from github.

      </aside>

      1. Note: for more help running this pipeline, you can refer to the github page or the dry-guide documentation (i.e. you might need a certain version of nextflow installed etc.)
  2. Update pipeline file paths:
    1. The genomes-nf pipeline should automatically add all the genome data to the proper location in /projects/b1059/data/<species>/genomes/<project>/<WSXXX>/, so no need to move any files, but check to make sure everything looks good here.
    2. The one file that needs to be moved is the csq/<species>.gff that will replace the current file in NemaScan/input_data/<species>/annotations/.
    3. Default for the reference sequence needs to be changed in the following locations:
      1. alignment-nf/main.nf
      2. wi-gatk/main.nf
      3. annotation-nf/main.nf
    4. Files to be added to CeNDR:
      1. Check below with “Update a new release”

Alternative - use for C. briggsae or C. tropicalis manually curated references in the lab (also maybe C. elegans WI references in the future?)

  1. Run the pipeline with the following command:

    nextflow run andersenlab/genomes-nf \\
    --genome <path>.genome.fa \\
    --gff <path>.gff \\
    --species <species> \\
    --projects <project, i.e. NIC58_nanopore> \\
    --ws_build <version, i.e. June2021>
    

    <aside> 💡 NOTE: You can also choose to clone the git repo into your personal folder and run it locally, however we recommend running the pipeline remotely because it allows nextflow to store information about the git branch and commit of the run, allowing for best reproducible results. You can choose to run a specific commit using the -r XXX command, where XXX is the commit ID from github.

    </aside>

    1. Note: for more help running this pipeline, you can refer to the github page or the dry-guide documentation (i.e. you might need a certain version of nextflow installed etc.)
  2. Update pipeline file paths:

    1. The genomes-nf pipeline should automatically add all the genome data to the proper location in /projects/b1059/data/<species>/genomes/<project>/<WSXXX>/, so no need to move any files, but check to make sure everything looks good here.
    2. The one file that needs to be moved is the csq/<species>.gff that will replace the current file in NemaScan/input_data/<species>/annotations/.
    3. Default for the reference sequence needs to be changed in the following locations:
      1. alignment-nf/main.nf
      2. wi-gatk/main.nf
      3. annotation-nf/main.nf
    4. Files to be added to CeNDR:
      1. Check below with “Update a new release”
      2. NOTE: this might be different with manual genomes, we have not yet had a briggsae or tropicalis data release on CeNDR

Process raw FASTQ files directly from sequencer

  1. Download raw FASTQ files
    1. NUSeq will put FASTQ file directly onto QUEST in /projects/b1059/fromNUSeq
    2. Duke Server will send an email when the project is ready. Data can be downloaded using the ddsclient tool (info here)
    3. Other sequencing projects might send a link for download in an email
      1. You can use this command to download large files in the background on QUEST: wget -bqc <url>