OpenMS
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
PeptideAndProteinQuant Class Reference

Helper class for peptide and protein quantification based on feature data annotated with IDs. More...

#include <OpenMS/ANALYSIS/QUANTITATION/PeptideAndProteinQuant.h>

Inheritance diagram for PeptideAndProteinQuant:
[legend]
Collaboration diagram for PeptideAndProteinQuant:
[legend]

Classes

struct  PeptideData
 Quantitative and associated data for a peptide. More...
 
struct  ProteinData
 Quantitative and associated data for a protein. More...
 
struct  Statistics
 Statistics for processing summary. More...
 

Public Types

typedef std::map< UInt64, double > SampleAbundances
 Mapping: sample ID -> abundance. More...
 
typedef std::map< AASequence, PeptideDataPeptideQuant
 Mapping: peptide sequence (modified) -> peptide data. More...
 
typedef std::map< String, ProteinDataProteinQuant
 Mapping: protein accession -> protein data. More...
 

Public Member Functions

 PeptideAndProteinQuant ()
 Constructor. More...
 
 ~PeptideAndProteinQuant () override
 Destructor. More...
 
void readQuantData (FeatureMap &features, const ExperimentalDesign &ed)
 Read quantitative data from a feature map. More...
 
void readQuantData (ConsensusMap &consensus, const ExperimentalDesign &ed)
 Read quantitative data from a consensus map. More...
 
void readQuantData (std::vector< ProteinIdentification > &proteins, PeptideIdentificationList &peptides, const ExperimentalDesign &ed)
 Read quantitative data from identification results (for quantification via spectral counting). More...
 
void quantifyPeptides (const PeptideIdentificationList &peptides=PeptideIdentificationList())
 Compute peptide abundances. More...
 
void quantifyProteins (const ProteinIdentification &proteins=ProteinIdentification())
 Compute protein abundances. More...
 
std::map< OpenMS::String, OpenMS::StringmapAccessionToLeader (const OpenMS::ProteinIdentification &proteins) const
 
const StatisticsgetStatistics ()
 Get summary statistics. More...
 
const PeptideQuantgetPeptideResults ()
 Get peptide abundance data. More...
 
const ProteinQuantgetProteinResults ()
 Get protein abundance data. More...
 
void annotateQuantificationsToProteins (const ProteinQuant &protein_quants, ProteinIdentification &proteins, bool remove_unquantified=true)
 Annotate protein quant results as meta data to protein ids. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Private Member Functions

PeptideHit getAnnotation_ (PeptideIdentificationList &peptides)
 Get the "canonical" annotation (a single peptide hit) of a feature/consensus feature from the associated list of peptide identifications. More...
 
void quantifyFeature_ (const FeatureHandle &feature, size_t fraction, const String &filename, const PeptideHit &hit, Int channel_or_label)
 Gather quantitative information from a feature. More...
 
bool getBest_ (const std::map< Int, std::map< String, std::map< Int, std::map< Int, double >>>> &peptide_abundances, std::tuple< size_t, String, size_t, Int > &best)
 Determine fraction, filename, charge state, and channel of a peptide with the highest number of abundances. More...
 
template<typename T >
void orderBest_ (const std::map< T, SampleAbundances > &abundances, std::vector< T > &result)
 Order keys (charges/peptides for peptide/protein quantification) according to how many samples they allow to quantify, breaking ties by total abundance. More...
 
void normalizePeptides_ ()
 Normalize peptide abundances across samples by (multiplicative) scaling to equal medians. More...
 
void transferPeptideDataToProteins_ (const ProteinIdentification &proteins)
 Transfer peptide-level quantitative data to protein-level data structures. More...
 
std::vector< StringselectPeptidesForQuantification_ (const String &protein_accession, Size top_n, bool fix_peptides)
 Select peptides for protein quantification based on filtering criteria. More...
 
double aggregateAbundances_ (const std::vector< double > &abundances, const String &method) const
 Aggregate abundances using the specified mathematical method. More...
 
void calculateProteinAbundances_ (const String &protein_accession, const std::vector< String > &selected_peptides, const String &aggregate_method, Size top_n, bool include_all)
 Calculate protein abundances for a single protein using selected peptides. More...
 
void calculateFileAndChannelLevelProteinAbundances_ (const String &protein_accession, const std::vector< String > &selected_peptides, const String &aggregate_method, Size top_n, bool include_all, const std::map< String, String > &accession_to_leader)
 Calculate detailed protein abundances at channel level using selected peptides. More...
 
void performIbaqNormalization_ (const ProteinIdentification &proteins)
 Perform iBAQ normalization on protein abundances. More...
 
String getAccession_ (const std::set< String > &pep_accessions, const std::map< String, String > &accession_to_leader) const
 Get the "canonical" protein accession from the list of protein accessions of a peptide. More...
 
void countPeptides_ (PeptideIdentificationList &peptides)
 Count the number of identifications (best hits only) of each peptide sequence. More...
 
size_t getSampleIDFromFilenameAndChannel_ (const String &filename, Int channel_or_label, const ExperimentalDesign &ed) const
 Map (filename, channel) to sample using ExperimentalDesign. More...
 
void updateMembers_ () override
 Clear all data when parameters are set. More...
 

Private Attributes

Statistics stats_
 Processing statistics for output in the end. More...
 
PeptideQuant pep_quant_
 Peptide quantification data. More...
 
ProteinQuant prot_quant_
 Protein quantification data. More...
 
ExperimentalDesign experimental_design_
 Experimental design for filename/channel to sample mapping. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

Helper class for peptide and protein quantification based on feature data annotated with IDs.

This class is used by ProteinQuantifier. See there for further documentation.

Parameters of this class are:

NameTypeDefaultRestrictionsDescription
method stringtop top, iBAQ- top - quantify based on three most abundant peptides (number can be changed in 'top').
- iBAQ (intensity based absolute quantification), calculate the sum of all peptide peak intensities divided by the number of theoretically observable tryptic peptides (https://rdcu.be/cND1J). Warning: only consensusXML or featureXML input is allowed!
best_charge_and_fraction stringfalse true, falseDistinguish between fraction and charge states of a peptide. For peptides, abundances will be reported separately for each fraction and charge;
for proteins, abundances will be computed based only on the most prevalent charge observed of each peptide (over all fractions).
By default, abundances are summed over all charge states.
top:N int3 min: 0Calculate protein abundance from this number of proteotypic peptides (most abundant first; '0' for all)
top:aggregate stringmedian median, mean, weighted_mean, sumAggregation method used to compute protein abundances from peptide abundances
top:include_all stringfalse true, falseInclude results for proteins with fewer proteotypic peptides than indicated by 'N' (no effect if 'N' is 0 or 1)
consensus:normalize stringfalse true, falseScale peptide abundances so that medians of all samples are equal
consensus:fix_peptides stringfalse true, falseUse the same peptides for protein quantification across all samples.
With 'N 0',all peptides that occur in every sample are considered.
Otherwise ('N'), the N peptides that occur in the most samples (independently of each other) are selected,
breaking ties by total abundance (there is no guarantee that the best co-ocurring peptides are chosen!).

Note:
  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.

Member Typedef Documentation

◆ PeptideQuant

typedef std::map<AASequence, PeptideData> PeptideQuant

Mapping: peptide sequence (modified) -> peptide data.

◆ ProteinQuant

typedef std::map<String, ProteinData> ProteinQuant

Mapping: protein accession -> protein data.

◆ SampleAbundances

typedef std::map<UInt64, double> SampleAbundances

Mapping: sample ID -> abundance.

Constructor & Destructor Documentation

◆ PeptideAndProteinQuant()

Constructor.

◆ ~PeptideAndProteinQuant()

~PeptideAndProteinQuant ( )
inlineoverride

Destructor.

Member Function Documentation

◆ aggregateAbundances_()

double aggregateAbundances_ ( const std::vector< double > &  abundances,
const String method 
) const
private

Aggregate abundances using the specified mathematical method.

Parameters
abundancesVector of abundance values to aggregate
methodAggregation method ("median", "mean", "weighted_mean", "sum")
Returns
Aggregated abundance value

◆ annotateQuantificationsToProteins()

void annotateQuantificationsToProteins ( const ProteinQuant protein_quants,
ProteinIdentification proteins,
bool  remove_unquantified = true 
)

Annotate protein quant results as meta data to protein ids.

◆ calculateFileAndChannelLevelProteinAbundances_()

void calculateFileAndChannelLevelProteinAbundances_ ( const String protein_accession,
const std::vector< String > &  selected_peptides,
const String aggregate_method,
Size  top_n,
bool  include_all,
const std::map< String, String > &  accession_to_leader 
)
private

Calculate detailed protein abundances at channel level using selected peptides.

Parameters
protein_accessionThe protein accession
selected_peptidesVector of peptide sequences to use for quantification
aggregate_methodMethod to aggregate peptide abundances
top_nMaximum number of peptides to use per sample
include_allWhether to include proteins with insufficient peptides
accession_to_leaderMap for resolving protein group leaders

◆ calculateProteinAbundances_()

void calculateProteinAbundances_ ( const String protein_accession,
const std::vector< String > &  selected_peptides,
const String aggregate_method,
Size  top_n,
bool  include_all 
)
private

Calculate protein abundances for a single protein using selected peptides.

Parameters
protein_accessionThe protein accession
selected_peptidesVector of peptide sequences to use for quantification
aggregate_methodMethod to aggregate peptide abundances
top_nMaximum number of peptides to use per sample
include_allWhether to include proteins with insufficient peptides

◆ countPeptides_()

void countPeptides_ ( PeptideIdentificationList peptides)
private

Count the number of identifications (best hits only) of each peptide sequence.

The peptide hits in peptides are sorted by score in the process.

◆ getAccession_()

String getAccession_ ( const std::set< String > &  pep_accessions,
const std::map< String, String > &  accession_to_leader 
) const
private

Get the "canonical" protein accession from the list of protein accessions of a peptide.

Parameters
pep_accessionsProtein accessions of a peptide
accession_to_leaderCaptures information about indistinguishable proteins (maps accession to accession of group leader)

If there is no information about indistinguishable proteins (from protXML) available, a canonical accession exists only for proteotypic peptides - it's the single accession for the respective peptide.

Otherwise, a peptide has a canonical accession if it maps only to proteins of one indistinguishable group. In this case, the canonical accession is that of the group leader.

If there is no canonical accession, the empty string is returned.

◆ getAnnotation_()

PeptideHit getAnnotation_ ( PeptideIdentificationList peptides)
private

Get the "canonical" annotation (a single peptide hit) of a feature/consensus feature from the associated list of peptide identifications.

Only the best-scoring peptide hit of each ID in peptides is taken into account. The hits of each ID must already be sorted! If there's more than one ID and the best hits are not identical by sequence, or if there's no peptide ID, an empty peptide hit (for "ambiguous/no annotation") is returned. Protein accessions from identical peptide hits are accumulated.

◆ getBest_()

bool getBest_ ( const std::map< Int, std::map< String, std::map< Int, std::map< Int, double >>>> &  peptide_abundances,
std::tuple< size_t, String, size_t, Int > &  best 
)
private

Determine fraction, filename, charge state, and channel of a peptide with the highest number of abundances.

Parameters
peptide_abundancesConst input map fraction -> filename -> charge -> channel -> abundance
bestWill additionally return the best fraction, filename, charge state, and channel
Returns
true if at least one abundance was found, false otherwise

◆ getPeptideResults()

const PeptideQuant& getPeptideResults ( )

Get peptide abundance data.

◆ getProteinResults()

const ProteinQuant& getProteinResults ( )

Get protein abundance data.

◆ getSampleIDFromFilenameAndChannel_()

size_t getSampleIDFromFilenameAndChannel_ ( const String filename,
Int  channel_or_label,
const ExperimentalDesign ed 
) const
private

Map (filename, channel) to sample using ExperimentalDesign.

Parameters
filenameThe base filename (without path/extension)
channel_or_labelThe channel/label identifier
edThe experimental design containing the mapping information
Returns
The sample ID corresponding to the filename and channel

◆ getStatistics()

const Statistics& getStatistics ( )

Get summary statistics.

◆ mapAccessionToLeader()

std::map<OpenMS::String, OpenMS::String> mapAccessionToLeader ( const OpenMS::ProteinIdentification proteins) const

◆ normalizePeptides_()

void normalizePeptides_ ( )
private

Normalize peptide abundances across samples by (multiplicative) scaling to equal medians.

◆ orderBest_()

void orderBest_ ( const std::map< T, SampleAbundances > &  abundances,
std::vector< T > &  result 
)
inlineprivate

Order keys (charges/peptides for peptide/protein quantification) according to how many samples they allow to quantify, breaking ties by total abundance.

The keys of abundances are stored ordered in result, best first.

◆ performIbaqNormalization_()

void performIbaqNormalization_ ( const ProteinIdentification proteins)
private

Perform iBAQ normalization on protein abundances.

Parameters
proteinsProtein identification information containing sequences

◆ quantifyFeature_()

void quantifyFeature_ ( const FeatureHandle feature,
size_t  fraction,
const String filename,
const PeptideHit hit,
Int  channel_or_label 
)
private

Gather quantitative information from a feature.

Store quantitative information from feature in member pep_quant_, based on the peptide annotation in hit. fraction, use 0 for first fraction (or if no fractionation was performed) filename, the base filename (without path/extension) from which the feature originates channel_or_label, the channel/label identifier (e.g., TMT channel, typically 1 for LFQ) If hit is empty ("ambiguous/no annotation"), nothing is stored.

◆ quantifyPeptides()

void quantifyPeptides ( const PeptideIdentificationList peptides = PeptideIdentificationList())

Compute peptide abundances.

Based on quantitative data for individual charge states (in member pep_quant_), overall abundances for peptides are computed (and stored again in pep_quant_).

Quantitative data must first be read via readQuantData().

Optional (peptide-level) protein inference information (e.g. from Fido or ProteinProphet) can be supplied via peptides. In that case, peptide-to-protein associations - the basis for protein-level quantification - will also be read from peptides!

◆ quantifyProteins()

void quantifyProteins ( const ProteinIdentification proteins = ProteinIdentification())

Compute protein abundances.

Peptide abundances must be computed first with quantifyPeptides(). Optional protein inference information (e.g. BasicProteinInference or Epifany) can be supplied via proteins.

Parameters
proteinsOptional protein inference information

◆ readQuantData() [1/3]

void readQuantData ( ConsensusMap consensus,
const ExperimentalDesign ed 
)

Read quantitative data from a consensus map.

Parameters should be set before using this method, as setting parameters will clear all results.

◆ readQuantData() [2/3]

void readQuantData ( FeatureMap features,
const ExperimentalDesign ed 
)

Read quantitative data from a feature map.

Parameters should be set before using this method, as setting parameters will clear all results.

◆ readQuantData() [3/3]

void readQuantData ( std::vector< ProteinIdentification > &  proteins,
PeptideIdentificationList peptides,
const ExperimentalDesign ed 
)

Read quantitative data from identification results (for quantification via spectral counting).

Parameters should be set before using this method, as setting parameters will clear all results.

◆ selectPeptidesForQuantification_()

std::vector<String> selectPeptidesForQuantification_ ( const String protein_accession,
Size  top_n,
bool  fix_peptides 
)
private

Select peptides for protein quantification based on filtering criteria.

Parameters
protein_accessionThe protein accession to select peptides for
top_nMaximum number of peptides to select (0 = no limit)
fix_peptidesWhether to use consistent peptides across samples
Returns
Vector of selected peptide sequences

◆ transferPeptideDataToProteins_()

void transferPeptideDataToProteins_ ( const ProteinIdentification proteins)
private

Transfer peptide-level quantitative data to protein-level data structures.

This method populates prot_quant_ with peptide abundance and PSM count data.

Parameters
proteinsProtein identification information

◆ updateMembers_()

void updateMembers_ ( )
overrideprivatevirtual

Clear all data when parameters are set.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ experimental_design_

ExperimentalDesign experimental_design_
private

Experimental design for filename/channel to sample mapping.

◆ pep_quant_

PeptideQuant pep_quant_
private

Peptide quantification data.

◆ prot_quant_

ProteinQuant prot_quant_
private

Protein quantification data.

◆ stats_

Statistics stats_
private

Processing statistics for output in the end.