Written by: KSE
Last updated: 20220316 (KSE)
<aside> 💡 This protocol provides a detailed description of how to analyze wild isolate sequence data in the Andersen Lab. There are several nextflow pipelines run subsequently, however there are still several manual steps and checks that must be done to ensure proper analysis and an organized data structure that everyone involved can understand and contribute to.
</aside>
Table of contents:
<aside> 💡 NOTE: This step is not necessary for every sequencing analysis, only when we want to change the WS version (i.e. from WS276 to WS280). When you do create a new genome version, it will be necessary to change the defaults in several different pipelines. Also, this process will look different for species where we generate our own genome data like C. briggsae and C. tropicalis
</aside>
Create a new folder for your project analysis
Run the pipeline with the following command:
nextflow run andersenlab/genomes-nf \\
--projects <species>/<projectID> \\
--wb_version <WSXXX>
<aside>
💡 NOTE: You can also choose to clone the git repo into your personal folder and run it locally, however we recommend running the pipeline remotely because it allows nextflow to store information about the git branch and commit of the run, allowing for best reproducible results. You can choose to run a specific commit using the -r XXX
command, where XXX
is the commit ID from github.
</aside>
/projects/b1059/data/<species>/genomes/<project>/<WSXXX>/
, so no need to move any files, but check to make sure everything looks good here.csq/<species>.gff
that will replace the current file in NemaScan/input_data/<species>/annotations/
.alignment-nf/main.nf
wi-gatk/main.nf
annotation-nf/main.nf
Run the pipeline with the following command:
nextflow run andersenlab/genomes-nf \\
--genome <path>.genome.fa \\
--gff <path>.gff \\
--species <species> \\
--projects <project, i.e. NIC58_nanopore> \\
--ws_build <version, i.e. June2021>
<aside>
💡 NOTE: You can also choose to clone the git repo into your personal folder and run it locally, however we recommend running the pipeline remotely because it allows nextflow to store information about the git branch and commit of the run, allowing for best reproducible results. You can choose to run a specific commit using the -r XXX
command, where XXX
is the commit ID from github.
</aside>
Update pipeline file paths:
/projects/b1059/data/<species>/genomes/<project>/<WSXXX>/
, so no need to move any files, but check to make sure everything looks good here.csq/<species>.gff
that will replace the current file in NemaScan/input_data/<species>/annotations/
.alignment-nf/main.nf
wi-gatk/main.nf
annotation-nf/main.nf
/projects/b1059/fromNUSeq
ddsclient
tool (info here)wget -bqc <url>