|  | OpenMS
    
     | 
Create a decoy peptide database from standard FASTA databases.
Decoy databases are useful to control false discovery rates and thus estimate score cutoffs for identified spectra.
The decoy can either be generated by reversing or shuffling each of the peptides of a sequence (as defined by a given enzyme). For reversing the N and C terminus of the peptides are kept in position by default.
To get a 'contaminants' database have a look at http://www.thegpm.org/crap/index.html or find/create your own contaminant database.
Multiple databases can be provided as input, which will internally be concatenated before being used for decoy generation. This allows you to specify your target database plus a contaminant file and obtain a concatenated target-decoy database using a single call, e.g., DecoyDatabase -in human.fasta crap.fasta -out human_TD.fasta
By default, a combined database is created where target and decoy sequences are written interleaved (i.e., target1, decoy1, target2, decoy2,...). If you need all targets before the decoys for some reason, use only_decoy and concatenate the files externally.
The tool will keep track of all protein identifiers and report duplicates.
Also the tool automatically checks for decoys already in the input files (based on most common pre-/suffixes) and terminates the program if decoys are found.
The command line parameters of this tool are:
DecoyDatabase -- Create decoy sequence database from forward sequence database.
Full documentation: http://www.openms.de/doxygen/release/3.1.0/html/TOPP_DecoyDatabase.html
Version: 3.1.0 Oct 18 2023, 10:27:18, Revision: 17a07f8
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.
Usage:
  DecoyDatabase <options>
This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option
Options (mandatory options marked with '*'):
  -in <file(s)>*                   Input FASTA file(s), each containing a database. It is recommended to incl
                                   ude a contaminant database as well. (valid formats: 'fasta')
  -out <file>*                     Output FASTA file where the decoy database will be written to. (valid form
                                   ats: 'fasta')
  -decoy_string <string>           String that is combined with the accession of the protein identifier to 
                                   indicate a decoy protein. (default: 'DECOY_')
  -decoy_string_position <choice>  Should the 'decoy_string' be prepended (prefix) or appended (suffix) to 
                                   the protein accession? (default: 'prefix') (valid: 'prefix', 'suffix')
  -only_decoy                      Write only decoy proteins to the output database instead of a combined 
                                   database.
  -type <choice>                   Type of sequence. RNA sequences may contain modification codes, which will
                                    be handled correctly if this is set to 'RNA'. (default: 'protein') (valid
                                   : 'protein', 'RNA')
  -method <choice>                 Method by which decoy sequences are generated from target sequences. Note 
                                   that all sequences are shuffled using the same random seed, ensuring that 
                                   identical sequences produce the same shuffled decoy sequences. Shuffled 
                                   sequences that produce highly similar output sequences are shuffled again 
                                   (see shuffle_sequence_identity_threshold). (default: 'reverse') (valid: 
                                   'reverse', 'shuffle')
  -enzyme <enzyme>                 Enzyme used for the digestion of the sample. Only applicable if parameter 
                                   'type' is 'protein'. (default: 'Trypsin') (valid: 'Arg-C', 'Arg-C/P', 'Asp
                                   -N_ambic', 'Chymotrypsin', 'Chymotrypsin/P', 'CNBr', 'Formic_acid', 'Lys-C
                                   ', 'Lys-N', 'Lys-C/P', 'PepsinA', 'TrypChymo', 'Trypsin/P', 'V8-DE', 'V8-E
                                   ', 'Alpha-lytic protease', 'leukocyte elastase', 'proline endopeptidase', 
                                   'glutamyl endopeptidase', '2-iodobenzoate', 'iodosobenzoate', 'staphylococ
                                   cal protease/D', 'proline-endopeptidase/HKR', 'Glu-C+P', 'PepsinA + P', 
                                   'cyanogen-bromide', 'Clostripain/P', 'elastase-trypsin-chymotrypsin', 'Asp
                                   -N/B', 'Asp-N', 'Trypsin', 'no cleavage', 'unspecific cleavage')
                                   
Common TOPP options:
  -ini <file>                      Use the given TOPP INI file
  -threads <n>                     Sets the number of threads allowed to be used by the TOPP tool (default: 
                                   '1')
  -write_ini <file>                Writes the default configuration file
  --help                           Shows options
  --helphelp                       Shows all options (including advanced)
The following configuration subsections are valid:
 - Decoy   Decoy parameters section
You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/3.1.0/html/TOPP_DecoyDatabase.html
INI file documentation of this tool: