COVID-19: variation analysis on ARTIC PE data

Annotation: Call variants from ampliconic paired-end reads.

StepAnnotation
Step 1: Input dataset
select at runtime
Step 2: Input dataset
select at runtime
Step 3: Input dataset
select at runtime
Step 4: Input dataset collection
select at runtime
A paired collection of fastq datasets to call variants from
Step 5: fastp
Paired Collection
Output dataset 'output' from step 4
Adapter Trimming Options:
False
Empty.
Empty.
Global trimming options:
Not available.
Not available.
Not available.
Not available.
Overrepresented Sequence Analysis:
False
Not available.
Filter Options:
Quality filtering options:
False
Not available.
Not available.
Not available.
Length filtering options:
False
Not available.
Not available.
Low complexity filtering options:
False
Not available.
Read Modification Options:
Automatic trimming for Illumina NextSeq/NovaSeq data
Not available.
Disable polyX trimming
UMI processing:
False
Empty.
Not available.
Empty.
Per read cutting by quality options:
False
False
Not available.
Not available.
Base correction by overlap analysis options:
False
Output Options:
True
True
Step 6: Map with BWA-MEM
Use a genome from history and build index
Output dataset 'output' from step 3
Auto. Let BWA decide the best algorithm to use
Paired Collection
Output dataset 'output_paired_coll' from step 5
Empty.
Do not set
1.Simple Illumina mode
Step 7: Samtools view
Output dataset 'bam_output' from step 6
A filtered/subsampled selection of reads
Configure filters:
No
No
20
Empty.
Not available.
Read is paired
Read is unmapped Mate is unmapped Alignment of the read is not primary
Nothing selected.
Configure subsampling:
Specify a downsampling factor
1.0
Not available.
All reads retained after filtering and subsampling
False
Read Reformatting Options:
Strip read tags from outputs
False
BAM (-b)
No, see help (-output-fmt-option no_ref)
Step 8: ivar trim
Output dataset 'outputsam' from step 7
History
Output dataset 'output' from step 2
1
0
4
True
Step 9: Samtools stats
Output dataset 'outputsam' from step 7
No
False
One single summary file
Do not filter
Not available.
Not available.
Not available.
Not available.
Not available.
No
No
False
False
Not available.
Step 10: Realign reads
Output dataset 'output_bam' from step 8
History
Output dataset 'output' from step 3
Advanced options:
False
Keep unchanged
2
Step 11: Insert indel qualities
Output dataset 'realigned' from step 10
Dindel
History
Output dataset 'output' from step 3
Step 12: QualiMap BamQC
Output dataset 'realigned' from step 10
All (whole genome)
False
Reads flagged as duplicates in input
Settings affecting specific plots:
400
True
Nothing selected.
3
Step 13: Call variants
Output dataset 'output' from step 11
History
Output dataset 'output' from step 3
Whole reference
SNVs and indels
Configure settings
Coverage:
5
1000000
Paired reads:
False
Base-calling quality:
30
30
Use original base qualities
Base alignment quality:
Yes, and prefer existing alignment qualities encoded in input
Base and indel alignment qualities (BAQ and IDAQ)
True
Mapping quality:
20
Yes, incorporate MAPQ into joint quality score
255
Source quality:
No, don't incorporate source quality into joint quality score
Joint quality:
0
0
0
Custom filter settings/combinations
0.0005
0
False
Step 14: Flatten Collection
Output dataset 'raw_data' from step 12
underscore ( _ )
Step 15: Lofreq filter
Output dataset 'variants' from step 13
SNVs and Indels
Quality-based filter options:
No, don't apply call quality filter
No, don't apply call quality filter
Coverage-based filter options:
5
0
Allele frequency filter options:
0.05
0.95
Strand bias filter options:
No, don't apply strand-bias filter
Drop variants not passing one or more filters
Step 16: MultiQC
Results
Results 1
fastp
Output dataset 'report_json' from step 5
Results 2
Samtools
Samtools outputs
Samtools output 1
stats
Output dataset 'output' from step 9
Results 3
Qualimap (BamQC or RNASeq output)
Output dataset 'output' from step 14
Empty.
Empty.
False
False
Step 17: ivar removereads
Output dataset 'output' from step 11
Output dataset 'outvcf' from step 15
Output dataset 'output' from step 2
Output dataset 'output' from step 1
Step 18: Call variants
Output dataset 'output_bam' from step 17
History
Output dataset 'output' from step 3
Whole reference
SNVs and indels
Configure settings
Coverage:
5
1000000
Paired reads:
False
Base-calling quality:
30
30
Use original base qualities
Base alignment quality:
Yes, and prefer existing alignment qualities encoded in input
Base and indel alignment qualities (BAQ and IDAQ)
True
Mapping quality:
20
Yes, incorporate MAPQ into joint quality score
255
Source quality:
No, don't incorporate source quality into joint quality score
Joint quality:
0
0
0
Custom filter settings/combinations
0.0005
0
False
Step 19: VCF-VCFintersect:
Output dataset 'variants' from step 18
Output dataset 'variants' from step 13
History
Output dataset 'output' from step 3
Intersect
True
0
False
Don't use advanced options
Step 20: Replace Text
Output dataset 'out_file1' from step 19
Replacements
Replacement 1
^(##reference.+)$
\1\n##INFO=<ID=AmpliconBias,Number=0,Type=Flag,Description="Indicates that the AF value of the variant could not be corrected for potential amplicon bias.">
Replacement 2
^([^#].+)$
\1;AmpliconBias
Step 21: VCF-VCFintersect:
Output dataset 'outfile' from step 20
Output dataset 'variants' from step 18
History
Output dataset 'output' from step 3
Union
False
0
False
Don't use advanced options
Step 22: Replace Text
Output dataset 'out_file1' from step 21
Replacements
Replacement 1
^(##reference.+)$
\1\n##INFO=<ID=AmpliconBias,Number=0,Type=Flag,Description="Indicates that the AF value of the variant could not be corrected for potential amplicon bias.">
Step 23: SnpEff eff:
Output dataset 'outfile' from step 22
VCF
NC_045512.2: COVID19 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1
VCF (only if input is VCF)
False
No upstream / downstream intervals (0 bases)
Use 'EFF' field compatible with older versions (instead of 'ANN') Use Classic Effect names and amino acid variant annotations (NON_SYNONYMOUS_CODING vs missense_variant and G180R vs p.Gly180Arg/c.538G>C)
select at runtime
select at runtime
Do not show DOWNSTREAM changes Do not show INTERGENIC changes Do not show UPSTREAM changes
No
Use default (based on input type)
Empty.
True
True
Step 24: Lofreq filter
Output dataset 'snpeff_output' from step 23
SNVs and Indels
Quality-based filter options:
No, don't apply call quality filter
No, don't apply call quality filter
Coverage-based filter options:
0
0
Allele frequency filter options:
0.0
0.0
Strand bias filter options:
Yes, filter on multiple testing corrected strand-bias p-value (lofreq default)
0.001
False-discovery rate
True
False
Keep variants, but indicate failed filters in output FILTER column