COVID-19: variation analysis reporting

Annotation: Generate variant reports for the output of SARS-CoV-2 variation analysis workflows

StepAnnotation
Step 1: Input parameter
Not available.
The number of clusters that your samples should get grouped into in the Variant Frequency Plot.
Step 2: Input dataset
select at runtime
Tabular file with two columns specifying SnpEffcovid19 transcript identifiers to consider in the report (first column; e.g., GU280_gp06) and the names they should be translated to in the report (second column; e.g., orf6)
Step 3: Input dataset collection
select at runtime
A collection of VCF datasets to report. Note: Preexisting FILTER entries of variants will be applied on top of the workflow-specific filter options. Remove FILTER values that you do not want to be applied before running this workflow!
Step 4: Input parameter
Not available.
minimal allele-frequency threshold
Step 5: Input parameter
Not available.
minimal depth of coverage threshold
Step 6: Input parameter
Not available.
minimal number of variant-supporting reads threshold (calculated from DP4 field)
Step 7: Compose text parameter value
components
components 1
Text Parameter
(AF <
components 2
Float Parameter
Not available.
components 3
Text Parameter
) | (DP <
components 4
Integer Parameter
Not available.
components 5
Text Parameter
) | (DP4[2] + DP4[3] <
components 6
Integer Parameter
Not available.
components 7
Text Parameter
)
Step 8: Compose text parameter value
components
components 1
Text Parameter
min_af_
components 2
Float Parameter
Not available.
components 3
Text Parameter
|min_dp_
components 4
Integer Parameter
Not available.
components 5
Text Parameter
|min_dp_alt_
components 6
Integer Parameter
Not available.
Step 9: SnpSift Filter
Output dataset 'output' from step 3
Simple expression
Not available.
False
Add a value to the FILTER field of selected variants
Not available.
Step 10: SnpSift Extract Fields
Output dataset 'output' from step 9
CHROM POS FILTER REF ALT DP AF DP4 SB EFF[*].IMPACT EFF[*].FUNCLASS EFF[*].EFFECT EFF[*].GENE EFF[*].CODON EFF[*].AA EFF[*].TRID
True
Empty.
.
Step 11: Replace column
Output dataset 'output' from step 10
Output dataset 'output' from step 2
16
1
Tab
Skip / Do not print
#
Map effects to desired transcripts
Step 12: Datamash
Output dataset 'outfile_replace' from step 11
1,2,3,4,5,6,7,8,9
True
True
False
False
False
Operation to perform on each groups
Operation to perform on each group 1
Combine all values
10
Operation to perform on each group 2
Combine all values
11
Operation to perform on each group 3
Combine all values
12
Operation to perform on each group 4
Combine all values
13
Operation to perform on each group 5
Combine all values
14
Operation to perform on each group 6
Combine all values
15
Operation to perform on each group 7
Combine all values
16
Collapse to one line per variant
Step 13: Replace
Output dataset 'out_file' from step 12
\t([^\t,]+),[^\t]+\t([^\t,]+),[^\t]+\t([^\t,]+),[^\t]+\t([^\t,]+),[^\t]+\t([^\t,]+),[^\t]+\t([^\t,]+),[^\t]+\t([^\t,]+),[^\s]+
\t$1\t$2\t$3\t$4\t$5\t$6\t$7
True
True
False
False
True
entire line
Keep only first (highest impact) effect per variant
Step 14: Replace
Output dataset 'outfile' from step 13
(GroupBy|collapse)\(([^)]+)\)
$2
True
True
False
False
False
entire line
Remove datamash details from header line
Step 15: Collapse Collection
Output dataset 'outfile' from step 14
True
True
Same line and each line in dataset
Step 16: Compute
str(c5) + '>' + str(c6)
Output dataset 'output' from step 15
NO
yes
change
no
Append "change" column
Step 17: Compute
str(int(c3)) + ':' + str(c18)
Output dataset 'out_file1' from step 16
NO
yes
change_with_pos
no
Add "position:change" column
Step 18: Replace
Output dataset 'out_file1' from step 17
EFF[*].
Empty.
False
True
False
False
False
entire line
Remove Snpsift details from header line
Step 19: Filter
Output dataset 'outfile' from step 18
c4=='PASS' or c4=='.'
1
Get filter-passing variants
Step 20: Datamash
Output dataset 'outfile' from step 18
19
True
True
True
False
False
Operation to perform on each groups
Operation to perform on each group 1
Count Unique values
1
Operation to perform on each group 2
minimum
8
Operation to perform on each group 3
maximum
8
Operation to perform on each group 4
Combine all values
1
Operation to perform on each group 5
Combine all values
8
Calculate cross-sample per-variant allele stats
Step 21: Datamash
Output dataset 'outfile' from step 18
3
True
True
True
False
False
Operation to perform on each groups
Operation to perform on each group 1
Count Unique values
1
Operation to perform on each group 2
minimum
8
Operation to perform on each group 3
maximum
8
Operation to perform on each group 4
Count Unique values
18
Operation to perform on each group 5
Count Unique values
12
Calculate cross-sample per-variant position stats
Step 22: Datamash
Output dataset 'out_file1' from step 19
3
True
True
True
False
False
Operation to perform on each groups
Operation to perform on each group 1
Count Unique values
1
Get positions at which at least one sample has a filter-passing variant
Step 23: Datamash
Output dataset 'out_file1' from step 19
19
True
True
True
False
False
Operation to perform on each groups
Operation to perform on each group 1
Count Unique values
1
Get variants that are filter-passing in at least one sample
Step 24: Join
Output dataset 'out_file1' from step 19
19
Output dataset 'out_file' from step 20
1
Both 1st & 2nd file.
True
False
0
Keep only variamts that passed Snpsift filters
Step 25: Join
Output dataset 'outfile' from step 18
3
Output dataset 'out_file' from step 22
1
Both 1st & 2nd file.
True
False
0
get per-sample variants at positions that are passing filters in at least one sample
Step 26: Join
Output dataset 'out_file1' from step 17
19
Output dataset 'out_file' from step 23
1
Both 1st & 2nd file.
True
False
0
Get per-sample variants that are passing filters in at least one sample
Step 27: Datamash
Output dataset 'output' from step 24
1
True
True
True
True
False
Operation to perform on each groups
Operation to perform on each group 1
Combine all unique values
2
Step 28: Join
Output dataset 'output' from step 25
1
Output dataset 'out_file' from step 21
1
Both 1st & 2nd file.
True
False
0
Merge variants with their per-position stats
Step 29: Cut
c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18
Tab
Output dataset 'output' from step 26
Step 30: Cut
c4,c6,c7,c12,c13,c14,c15,c16,c17,c18,c20,c21,c22,c25,c23,c24,c19
Tab
Output dataset 'out_file' from step 27
Step 31: Cut
c2,c1,c4,c5,c6,c7,c8,c10,c9,c11,c12,c13,c14,c15,c16,c17,c22,c23,c24,c25,c18
Tab
Output dataset 'output' from step 28
Step 32: Split file
Tabular
Output dataset 'out_file1' from step 29
1
By column
1
(.*)
\1
Step 33: Replace
Output dataset 'out_file1' from step 30
unique\(Sample\)\tcollapse\(Sample\)\tcollapse\(AF\)
SAMPLES(above-thresholds)\tSAMPLES(all)\tAFs(all)
True
True
False
False
False
entire line
Remove Snpsift details from header line
Step 34: Sort
Output dataset 'out_file1' from step 31
1
Column selections
Column selections 1
1
Ascending order
Alphabetical sort
Column selections 2
2
Ascending order
Fast numeric sort (-n)
False
False
Step 35: Variant Frequency Plot
Output dataset 'list_output_tab' from step 32
0.0
Image Properties:
SVG
0.67
Set3
Yes
Not available.
ward.D2
Step 36: Sort
Output dataset 'outfile' from step 33
1
Column selections
Column selections 1
1
Ascending order
General numeric sort ( scientific notation -g)
False
False