No elements found. Consider changing the search query.
List is empty.
Please provide a value for this option.
*
required
fastq.bz2
fastq.gz
fastq
fasta.bz2
fasta.gz
fasta
No compatible datasets available
Should be of datatype "fastq.gz" or "fasta"
Read 1 Adapters
3' (End) Adapters
Sequence of an adapter ligated to the 3' end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a '$' character is appended ('anchoring'), the adapter is only found if it is a suffix of the read. To search for a linked adapter, separate the 2 sequences with 3 dots (ADAPTER1...ADAPTER2), see Help below.
5' (Front) Adapters
Sequence of an adapter ligated to the 5' end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a '^' character is prepended ('anchoring'), the adapter is only found if it is a prefix of the read. To search for a linked adapter, separate the 2 sequences with 3 dots (ADAPTER1...ADAPTER2), see Help below.
5' or 3' (Anywhere) Adapters
Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of the first read). Both types of matches as described under 3' und 5' Adapters are allowed. If the first base of the read is part of the match, the behavior is as with 5' Adapters, otherwise as with 3' Adapters. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to!
Adapter Handling Options
*
Trim: trim adapter and upstream or downstream sequence
Trim: trim adapter and upstream or downstream sequence
Retain: the read is trimmed, but the adapter sequence is not removed
Mask: mask adapters with 'N' characters instead of trimming them
Lowercase: convert to lowercase
Crop: trim upstream and downstream sequences, i.e. retain adapter sequences only
None: leave unchanged
No elements found. Consider changing the search query.
List is empty.
*
Maximum allowed error rate (no. of errors divided by the length of the matching region). (--error-rate)
Do not allow indels in the alignments. That is, allow only mismatches. This option is currently only supported for anchored 5' adapters ('^ADAPTER') (default: both mismatches and indels are allowed). (--no-indels)
*
Try to remove adapters at most COUNT times. Useful when an adapter gets appended multiple times. (--times)
*
Minimum overlap length. If the overlap between the adapter and the sequence is shorter than LENGTH, the read is not modified. This reduces the number of bases trimmed purely due to short random adapter matches. (--overlap)
Interpret IUPAC wildcards in reads (--match-read-wildcards)
Interpret IUPAC wildcards in adapters. (--no-match-adapter-wildcards)
Check both the read and its reverse complement for adapter matches. If match is on reverse-complemented version, output that one. Default: check only read. (--rc)
Other Read Trimming Options
*
Remove bases from each read (first read only if paired). If positive, remove bases from the beginning. If negative, remove bases from the end. This is applied *before* adapter trimming. (--cut)
*
For paired-end data, you can define here a cut value to apply to R2 reads. Usage is identical to the R1 setting. Default: 0; ignored for single-end data. (-U)
*
Trim low-quality bases from 5' and/or 3' ends of each read before adapter removal. If one value is given, only the 3' end is trimmed. If two comma-separated cutoffs are given, the 5' end is trimmed with the first cutoff, the 3' end with the second. (--quality-cutoff)
- optional
For paired-end data, you can set here a separate quality cutoff to apply to R2 reads specifically. Leave empty to reuse the R1 cutoff setting. Ignored for single-end data. Syntax is identical to the R1 setting. (-Q)
*
Experimental option for quality trimming of NextSeq data. This is necessary because that machine cannot distinguish between G and reaching the end of the fragment (it encodes G as ‘black’). This option works like regular quality trimming (where one would use -q 20 instead), except that the qualities of G bases are ignored. (--nextseq-trim)
Trim N's on ends of reads. (--trim-n)
Note, this trim poly-T 'heads' on R2 (--poly-a)
Shortening reads to a fixed length
Disabled
Enabled
Disabled
No elements found. Consider changing the search query.
List is empty.
If you want to remove a fixed number of bases from each read, use the –cut option instead.
Separate shortening of R2 reads to a fixed length?
Treat R2 reads the same as R1 reads
Separate shortening of R2 reads
Treat R2 reads the same as R1 reads
No elements found. Consider changing the search query.
List is empty.
For paired-end data, shortening of R2 reads can be handled separately. Ignored for single-end data.
Read Filtering Options
Discard reads that contain the adapter instead of trimming them. Use the 'Minimum overlap length' option in order to avoid throwing away too many randomly matching reads! (--discard-trimmed)
Discard reads that do not contain the adapter. (--discard_untrimmed)
*
Discard reads that, after processing, are shorter than LENGTH. Note: You can set this parameter to zero to keep empty reads (with zero-length sequence and quality string) in the output, but some downstream tools may have problems with these. Default: 1 (--minimum-length)
- optional
For paired-end data, you can specify here a separate minimum length cutoff to apply to R2 reads. Leave empty to reuse the R1 cutoff set above. Ignored for single-end data.
- optional
Discard trimmed reads that are longer than LENGTH. Reads that are too long even before adapter removal are also discarded. (--maximum-length)
- optional
For paired-end data, you can specify here a separate maximum length cutoff to apply to R2 reads. Leave empty to reuse the R1 cutoff set above. Ignored for single-end data.
- optional
Discard reads with more than this number of 'N' bases. A number between 0 and 1 is interpreted as a fraction of the read length. (--max-n)
- optional
Discard reads whose expected number of errors (computed from quality values) exceeds this value. (--max-ee)
- optional
As --max-expected-errors (see above), but divided by length to account for reads of varying length (--max-aer)
Discard reads that did not pass CASAVA filtering (header has :Y:). (--discard-casava)
*
Any: a read pair is discarded (or redirected) if one of the reads (R1 or R2) fulfills the filtering criterion.
Any: a read pair is discarded (or redirected) if one of the reads (R1 or R2) fulfills the filtering criterion.
Both: filtering criteria must apply to both reads in order for a read pair to be discarded.
First: will make a decision about the read pair by inspecting whether the filtering criterion applies to the first read, ignoring the second read.
No elements found. Consider changing the search query.
List is empty.
Which of the reads in a paired-end read have to match the filtering critera above in order for the pair to be filtered. Default: any (--pair-filter)
Read Modification Options
- optional
Remove this suffix from read names if present. (--strip-suffix)
- optional
Search for TAG followed by a decimal number in the name of the read (description/comment field of the FASTA or FASTQ file). Replace the decimal number with the correct length of the trimmed read. For example, use --length-tag 'length=' to search for fields like 'length=123'. (--length-tag)
- optional
This option can be used to rename both single-end and paired-end reads. (--rename)
(--zero-cap)
- optional
Additional Options
Send an email notification when the job completes.
Help
What it does
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the 3’ sequencing adapter because the read is longer than the molecule that is sequenced, such as in microRNA, or CRISPR data, or Poly-A tails that are useful for pulling out RNA from your sample but often you don’t want them to be in your reads.
Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Cutadapt searches for the adapter in all reads and removes it when it finds it. Unless you use a filtering option, all reads that were present in the input file will also be present in the output file, some of them trimmed, some of them not. Even reads that were trimmed entirely (because the adapter was found in the very beginning) are output. All of this can be changed with options in the tool form above.
Normally, the tool looks for adapters on R1 and R2 reads independently. That is, the best matching R1 adapter of each type (3' End, 5' End, Anywhere) is removed from R1 and the best matching R2 adapter of each type is removed from R2.
To change this, you can use the Pairwise adapter search (--pair-adapters) option, which causes each R1 adapter to be paired up with its corresponding R2 adapter. The first R1 adapter of a given type that you specify will be paired up with the first R2 adapter of that type, and so on. The adapters are then always removed in pairs from a read pair.
For example, if you specify the following two 3'-end adapters for the R1 reads:
AAAAA
GGGGG
and these two 3'-end adapters for the R2 reads:
CCCC
TTTT
then, with this option enabled, the tool will trim a pair of reads only if:
either AAAAA is found in R1 and CCCCC is found in R2,
or GGGG is found in R1 and TTTT is found in R2.
Two limitations exist in this mode:
You need to provide equal numbers of R1 and R2 adapters of each type to allow pair formation, or the tool run will fail.
The algorithm identifies the best-matching R1 adapter first and then checks whether it can find its corresponding R2 adapter. If not, the read pair remains unchanged, even though it is, in theory, possible that a different R1 adapter that does not fit as well would have had a corresponding R2 adapter present, i.e., some legitimate adapter pairs might remain unhandled.
Optionally, under Output Options you can choose to output
Report
Info file
Report
Cutadapt can output per-adapter statistics if you select to generate the report above.
Example:
This is cutadapt 3.4 with Python 3.9.2
Command line parameters: -j=1 -a AGATCGGAAGAGC -A AGATCGGAAGAGC --output=out1.fq.gz --paired-output=out2.fq.gz --error-rate=0.1 --times=1
--overlap=3 --action=trim --minimum-length=30:40 --pair-filter=both --cut=0 bwa-mem-fastq1_assimetric_fq_gz.fq.gz bwa-mem-fastq2_assimetric_fq_gz.fq.gz
Processing reads on 1 core in paired-end mode ...
Finished in 0.01 s (129 µs/read; 0.46 M reads/minute).
=== Summary ===
Total read pairs processed: 99
Read 1 with adapter: 2 (2.0%)
Read 2 with adapter: 4 (4.0%)
Pairs that were too short: 3 (3.0%)
Pairs written (passing filters): 96 (97.0%)
Total basepairs processed: 48,291 bp
Read 1: 24,147 bp
Read 2: 24,144 bp
Total written (filtered): 48,171 bp (99.8%)
Read 1: 24,090 bp
Read 2: 24,081 bp
Info file
The info file contains information about the found adapters. The output is a tab-separated text file. Each line corresponds to one read of the input file.
Columns contain the following data:
1st: Read name
2nd: Number of errors
3rd: 0-based start coordinate of the adapter match
4th: 0-based end coordinate of the adapter match
5th: Sequence of the read to the left of the adapter match (can be empty)
6th: Sequence of the read that was matched to the adapter
7th: Sequence of the read to the right of the adapter match (can be empty)
8th: Name of the found adapter
9th: Quality values corresponding to sequence left of the adapter match (can be empty)
10th: Quality values corresponding to sequence matched to the adapter (can be empty)
11th: Quality values corresponding to sequence to the right of the adapter (can be empty)
The concatenation of columns 5-7 yields the full read sequence. Column 8 identifies the found adapter. Adapters without a name are numbered starting from 1. Fields 9-11 are empty if quality values are not available. Concatenating them yields the full sequence of quality values.
If no adapter was found, the format is as follows:
Read name
The value -1
The read sequence
Quality values
When parsing the file, be aware that additional columns may be added in the future. Note also that some fields can be empty, resulting in consecutive tabs within a line.
If the --times option is used and greater than 1, each read can appear more than once in the info file. There will be one line for each found adapter, all with identical read names. Only for the first of those lines will the concatenation of columns 5-7 be identical to the original read sequence (and accordingly for columns 9-11). For subsequent lines, the shown sequence are the ones that were used in subsequent rounds of adapter trimming, that is, they get successively shorter.
Renaming Reads
The --rename option expects a template string such as {id} extra_info {adapter_name} as a parameter. It can contain regular text and placeholders that consist of a name enclosed in curly braces ({placeholdername}).
The read name will be set to the template string in which the placeholders are replaced with the actual values relevant for the current read.
The following placeholders are currently available for single-end reads:
{header} – the full, unchanged header
{id} – the read ID, that is, the part of the header before the first whitespace
{comment} – the part of the header after the whitespace following the ID
{adapter_name} – the name of adapter that was found in this read or no_adapter if there was none adapter match. If you use --times to do multiple rounds of adapter matching, this is the name of the last found adapter.
{match_sequence} – the sequence of the read that matched the adapter (including errors). If there was no adapter match, this is set to an empty string. If you use a linked adapter, this is to the two matching strings, separated by a comma.
{cut_prefix} – the prefix removed by the --cut (or -u) option (that is, when used with a positive length argument)
{cut_suffix} – the suffix removed by the --cut (or -u) option (that is, when used with a negative length argument)
{rc} – this is replaced with the string rc if the read was reverse complemented. This only applies when reverse complementing was requested
If the --rename option is used with paired-end data, the template is applied separately to both R1 and R2. That is, for R1, the placeholders are replaced with values from R1, and for R2, the placeholders are replaced with values from R2. For example, {comment} becomes R1’s comment in R1 and it becomes R2’s comment in R2.
For paired-end data, the placeholder {rn} is available (“read number”), and it is replaced with 1 in R1 and with 2 in R2.
In addition, it is possible to write a placeholder as {r1.placeholdername} or {r2.placeholdername}, which always takes the replacement value from R1 or R2, respectively.
The {r1.placeholder} and {r2.placeholder} notation is available for all placeholders except {rn} and {id} because the read ID needs to be identical for both reads.
Loading required information from Galaxy server....
April 9 Maintenance
Due to scheduled data center maintenance, our compute cluster will not be available on April 9.
This means all running jobs will be stopped and
restarted by us as soon as the maintenance work is completed. You will not lose any data. The Galaxy website will still be reachable,
but jobs will not start (remain gray). Uploading data and downloading entire histories will not be possible.