![]() |
OpenMS
|
This could be merged in the future with the general IDMergerAlgorithm since it shares a lot. IDMergerAlgorithm needs additional methods to have multiple runs as output. It also needs to store an extended mapping internally to distribute the PeptideIDs to the right output run according to origin and label. And should have non-copying/moving overloads for inserting PeptideIDs since we probably do not want to distribute the PeptideIDs to the features again. In general detaching IDs from features would be of great help here.
Untested for TMT/iTraq data where you usually have one Identification run per File but in one File you might have multiple conditions multiplexed, that you might want to split for inference. Problem: There is only one PeptideIdentification object per Feature that is representative for all "sub maps" (in this case the labels/reporter ions). -> A lookup is necessary if the reporter ion had non-zero intensity and if so, the peptide ID needs to be duplicated for every new (condition-based) IdentificationRun it is supposed to be used in, according to the mapping.
Fix output in parallel mode, change assignment of charges to threads, add parallel TOPP test (Marc)
Implement user-specified seed lists support (Marc)
Implement reading of pepXML and protXML (Andreas)
Allow reading of zipped XML files (David, Hiwi)
Implement support for labeled MRM experiments, Q1 m/z value and charges. (Andreas)
Implement support for more than one mass delta, e.g. from missed cleavages and so on (Andreas)
test performance and make fitGumbelGauss available via parameters.
allow charge state based fitting
allow semi-supervised by using decoy annotations
allow non-parametric via kernel density estimation
ProteinInference -- Protein inference based on an aggregation of the scores of the identified peptides.
Full documentation: http://www.openms.de/doxygen/release/3.1.0/html/TOPP_ProteinInference.html
Version: 3.1.0 Oct 18 2023, 10:27:18, Revision: 17a07f8
To cite OpenMS:
+ Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for
mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.
Usage:
ProteinInference <options>
Options (mandatory options marked with '*'):
-in <file>* Input file(s) (valid formats: 'idXML', 'consens
usXML')
-out <file>* Output file (valid formats: 'idXML', 'consensus
XML')
-out_type <file> Output file type (valid: 'idXML', 'consensusXML
')
-merge_runs <choice> If your idXML contains multiple runs, merge
them beforehand? Otherwise performs inference
separately per run. (default: 'all') (valid:
'no', 'all')
-protein_fdr <option> Additionally calculate the target-decoy FDR on
protein-level after inference (default: 'false'
) (valid: 'true', 'false')
Merging:
-Merging:annotate_origin <choice> If true, adds a map_index MetaValue to the Pept
ideIDs to annotate the IDRun they came from.
(default: 'true') (valid: 'true', 'false')
-Merging:allow_disagreeing_settings Force merging of disagreeing runs. Use at your
own risk.
Algorithm:
-Algorithm:min_peptides_per_protein <number> Minimal number of peptides needed for a protein
identification. If set to zero, unmatched prot
eins get a score of -Infinity. If bigger than
zero, proteins with less peptides are filtered
and evidences removed from the PSMs. PSMs that
do not reference any proteins anymore are remov
ed but the spectrum info is kept. (default:
'1') (min: '0')
-Algorithm:score_aggregation_method <choice> How to aggregate scores of peptides matching
to the same protein? (default: 'best') (valid:
'best', 'product', 'sum', 'maximum')
-Algorithm:treat_charge_variants_separately <choice> If this is true, different charge variants of
the same peptide sequence count as individual
evidences. (default: 'true') (valid: 'true',
'false')
-Algorithm:treat_modification_variants_separately <choice> If this is true, different modification variant
s of the same peptide sequence count as individ
ual evidences. (default: 'true') (valid: 'true'
, 'false')
-Algorithm:use_shared_peptides <choice> If this is true, shared peptides are used as
evidences. Note: shared_peptides are not delete
d and potentially resolved in postprocessing
as well. (default: 'true') (valid: 'true', 'fal
se')
-Algorithm:skip_count_annotation If this is set, peptide counts won't be annotat
ed at the proteins.
-Algorithm:annotate_indistinguishable_groups <choice> If this is true, calculates and annotates indis
tinguishable protein groups. (default: 'true')
(valid: 'true', 'false')
-Algorithm:greedy_group_resolution If this is true, shared peptides will be associ
ated to best proteins only (i.e. become potenti
ally quantifiable razor peptides).
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used
by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
INI file documentation of this tool: Document which metavalues of Protein/PeptideHit are filled when reading ProtXML (Chris)
Writing of protXML is currently not supported
Handle Modifications (Andreas)
Complete rewrite of the parser (and those of InsPecT and PepNovo), the code is bullshit... (Andreas)