OpenMS
FragmentIndex Class Reference

Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides. More...

#include <OpenMS/ANALYSIS/ID/FragmentIndex.h>

Inheritance diagram for FragmentIndex:
[legend]
Collaboration diagram for FragmentIndex:
[legend]

Classes

struct  Fragment
 One entry in the fragment index. More...
 
struct  Hit
 
struct  Peptide
 Compact descriptor of a peptide instance held by the FragmentIndex. More...
 
struct  SpectrumMatch
 Match between a query peak and an entry in the DB. More...
 
struct  SpectrumMatchesTopN
 container for SpectrumMatch. Also keeps count of total number of candidates and total number of matches. More...
 

Public Member Functions

 FragmentIndex ()
 Default constructor. More...
 
 ~FragmentIndex () override=default
 Default destructor. More...
 
bool isBuild () const
 Indicates whether the fragment index has been built. More...
 
const std::vector< Peptide > & getPeptides () const
 Returns a reference to the internal peptide container. More...
 
void build (const std::vector< FASTAFile::FASTAEntry > &fasta_entries)
 Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass. More...
 
void clear ()
 Delete fragment index. Sets is_build=false. More...
 
std::pair< size_t, size_t > getPeptidesInPrecursorRange (float precursor_mass, const std::pair< float, float > &window)
 
std::vector< Hitquery (const Peak1D &peak, const std::pair< size_t, size_t > &peptide_idx_range, uint16_t peak_charge)
 Queries one peak. More...
 
void querySpectrum (const MSSpectrum &spectrum, SpectrumMatchesTopN &sms)
 : queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Protected Member Functions

void updateMembers_ () override
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void generatePeptides (const std::vector< FASTAFile::FASTAEntry > &fasta_entries)
 Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search. More...
 
- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 

Protected Attributes

bool is_build_ {false}
 true, if the database has been populated with fragments More...
 
std::vector< Peptidefi_peptides_
 vector of all (digested) peptides More...
 
std::vector< Fragmentfi_fragments_
 vector of all theoretical fragments (b- and y- ions) More...
 
float fragment_min_mz_
 smallest fragment mz More...
 
float fragment_max_mz_
 largest fragment mz
More...
 
size_t bucketsize_
 number of fragments per outer node More...
 
std::vector< float > bucket_min_mz_
 vector of the smalles fragment mz of each bucket More...
 
float precursor_mz_tolerance_
 
bool precursor_mz_tolerance_unit_ppm_ {true}
 
float fragment_mz_tolerance_
 
bool fragment_mz_tolerance_unit_ppm_ {true}
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Private Member Functions

void queryPeaks (SpectrumMatchesTopN &candidates, const MSSpectrum &spectrum, const std::pair< size_t, size_t > &candidates_range, const int16_t isotope_error, const uint16_t precursor_charge)
 queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...) More...
 
void searchDifferentPrecursorRanges (const MSSpectrum &spectrum, float precursor_mass, SpectrumMatchesTopN &sms, uint16_t charge)
 If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks. More...
 
void trimHits (SpectrumMatchesTopN &init_hits) const
 places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted More...
 
bool isOpenSearchMode_ () const
 Helper function to determine if open search should be used based on tolerance. More...
 

Private Attributes

bool add_b_ions_
 
bool add_y_ions_
 
bool add_a_ions_
 
bool add_c_ions_
 
bool add_x_ions_
 
bool add_z_ions_
 
std::string digestion_enzyme_
 
size_t missed_cleavages_
 number of missed cleavages More...
 
float peptide_min_mass_
 
float peptide_max_mass_
 
size_t peptide_min_length_
 
size_t peptide_max_length_
 
StringList modifications_fixed_
 Modification that are one all peptides. More...
 
StringList modifications_variable_
 Variable Modification -> all possible comibnations are created. More...
 
size_t max_variable_mods_per_peptide_
 
uint16_t min_matched_peaks_
 PSM with less hits are discarded. More...
 
int16_t min_isotope_error_
 Minimal possible isotope error. More...
 
int16_t max_isotope_error_
 Maximal possible isotope error (both only used for closed search) More...
 
uint16_t min_precursor_charge_
 minimal possible precursor charge (usually always 1) More...
 
uint16_t max_precursor_charge_
 maximal possible precursor charge More...
 
uint16_t max_fragment_charge_
 The maximal possible charge of the fragments. More...
 
uint32_t max_processed_hits_
 The amount of PSM that will be used. the rest is filtered out. More...
 
float open_precursor_window_lower_
 Defines the lower bound of the precursor-mass range. More...
 
float open_precursor_window_upper_
 Defines the upper bound of the precursor-mass range. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 

Detailed Description

Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides.


Class Documentation

◆ OpenMS::FragmentIndex::SpectrumMatch

struct OpenMS::FragmentIndex::SpectrumMatch

Match between a query peak and an entry in the DB.

Collaboration diagram for FragmentIndex::SpectrumMatch:
[legend]
Class Members
int16_t isotope_error_
uint32_t num_matched_ Number of peaks-fragment hits.
size_t peptide_idx_ < The isotope_error used for the performed search

The idx this struct belongs to

uint16_t precursor_charge_ The precursor_charged used for the performed search.

Constructor & Destructor Documentation

◆ FragmentIndex()

Default constructor.

Initializes an empty FragmentIndex. Call build() before using any query functions. After clear(), the index returns to this unbuilt state.

Thread-safety: constructing the object is thread-safe as long as the instance is not shared across threads before initialization completes.

◆ ~FragmentIndex()

~FragmentIndex ( )
overridedefault

Default destructor.

Releases owned memory. If the index was built, all internal buffers and fragment buckets are freed. No exceptions are thrown.

Member Function Documentation

◆ build()

void build ( const std::vector< FASTAFile::FASTAEntry > &  fasta_entries)

Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass.

Parameters
fasta_entries

◆ clear()

void clear ( )

Delete fragment index. Sets is_build=false.

◆ generatePeptides()

void generatePeptides ( const std::vector< FASTAFile::FASTAEntry > &  fasta_entries)
protected

Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search.

Parameters
fasta_entries

◆ getPeptides()

const std::vector<Peptide>& getPeptides ( ) const

Returns a reference to the internal peptide container.

Provides read-only access to all peptides currently held by the index, typically populated during build().

Returns
const reference to the internal std::vector of Peptide.

Preconditions: The vector may be empty if build() has not been called yet. Thread-safety: read-only view; safe to access concurrently as long as no thread mutates the index (e.g., build()/clear()).

◆ getPeptidesInPrecursorRange()

std::pair<size_t, size_t> getPeptidesInPrecursorRange ( float  precursor_mass,
const std::pair< float, float > &  window 
)

Return index range of all possible Peptides/Proteins, such that a vector can be created fitting that range (safe some memory)

Parameters
precursor_massThe mono-charged precursor mass (M+H)
windowDefines the lower and upper bound for the precusor mass. For closed search it only contains the tolerance. In case of open search it contains both tolerance and open-search-window
Returns
a pair of indexes defining all possible peptides which the current peak could hit

◆ isBuild()

bool isBuild ( ) const

Indicates whether the fragment index has been built.

Returns
true if build() has completed successfully and the index is ready for queries; false otherwise (e.g., after construction or after clear()).

Thread-safety: read-only and can be called concurrently with other read-only methods. Must not race with build()/clear() on the same instance.

◆ isOpenSearchMode_()

bool isOpenSearchMode_ ( ) const
inlineprivate

Helper function to determine if open search should be used based on tolerance.

◆ query()

std::vector<Hit> query ( const Peak1D peak,
const std::pair< size_t, size_t > &  peptide_idx_range,
uint16_t  peak_charge 
)

Queries one peak.

Parameters
peakThe queried peak
peptide_idx_rangeThe range of precursors/peptides the peptide could potentially belongs to
peak_chargeThe charge of the peak. Is used to calculate the mass from the mz
Returns
a vector of Hits(matching peptide_idx_range and matching fragment_mz_) containing the idx of the hitted peptide and the mass of the hit

◆ queryPeaks()

void queryPeaks ( SpectrumMatchesTopN candidates,
const MSSpectrum spectrum,
const std::pair< size_t, size_t > &  candidates_range,
const int16_t  isotope_error,
const uint16_t  precursor_charge 
)
private

queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...)

Parameters
[out]candidatesThe n best Spectrum matches
spectrumThe queried experimental spectrum
candidates_rangeThe range of precursors/peptides the peptide could potentially belong to
isotope_errorThe applied isotope error
precursor_chargeThe applied precursor charge

◆ querySpectrum()

void querySpectrum ( const MSSpectrum spectrum,
SpectrumMatchesTopN sms 
)

: queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses

Parameters
spectrumexperimental spectrum
[out]smsThe n best Spectrum matches

◆ searchDifferentPrecursorRanges()

void searchDifferentPrecursorRanges ( const MSSpectrum spectrum,
float  precursor_mass,
SpectrumMatchesTopN sms,
uint16_t  charge 
)
private

If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks.

If open search applies a precursor-mass window

Parameters
spectrumexperimental query-spectrum
precursor_massThe mass of the precursor (mz * charge)
[out]smsThe Top m SpectrumMatches
chargeApplied charge

◆ trimHits()

void trimHits ( SpectrumMatchesTopN init_hits) const
private

places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted

◆ updateMembers_()

void updateMembers_ ( )
overrideprotectedvirtual

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ add_a_ions_

bool add_a_ions_
private

◆ add_b_ions_

bool add_b_ions_
private

◆ add_c_ions_

bool add_c_ions_
private

◆ add_x_ions_

bool add_x_ions_
private

◆ add_y_ions_

bool add_y_ions_
private

◆ add_z_ions_

bool add_z_ions_
private

◆ bucket_min_mz_

std::vector<float> bucket_min_mz_
protected

vector of the smalles fragment mz of each bucket

◆ bucketsize_

size_t bucketsize_
protected

number of fragments per outer node

◆ digestion_enzyme_

std::string digestion_enzyme_
private

◆ fi_fragments_

std::vector<Fragment> fi_fragments_
protected

vector of all theoretical fragments (b- and y- ions)

◆ fi_peptides_

std::vector<Peptide> fi_peptides_
protected

vector of all (digested) peptides

◆ fragment_max_mz_

float fragment_max_mz_
protected

largest fragment mz

◆ fragment_min_mz_

float fragment_min_mz_
protected

smallest fragment mz

◆ fragment_mz_tolerance_

float fragment_mz_tolerance_
protected

◆ fragment_mz_tolerance_unit_ppm_

bool fragment_mz_tolerance_unit_ppm_ {true}
protected

◆ is_build_

bool is_build_ {false}
protected

true, if the database has been populated with fragments

◆ max_fragment_charge_

uint16_t max_fragment_charge_
private

The maximal possible charge of the fragments.

◆ max_isotope_error_

int16_t max_isotope_error_
private

Maximal possible isotope error (both only used for closed search)

◆ max_precursor_charge_

uint16_t max_precursor_charge_
private

maximal possible precursor charge

◆ max_processed_hits_

uint32_t max_processed_hits_
private

The amount of PSM that will be used. the rest is filtered out.

◆ max_variable_mods_per_peptide_

size_t max_variable_mods_per_peptide_
private

◆ min_isotope_error_

int16_t min_isotope_error_
private

Minimal possible isotope error.

◆ min_matched_peaks_

uint16_t min_matched_peaks_
private

PSM with less hits are discarded.

◆ min_precursor_charge_

uint16_t min_precursor_charge_
private

minimal possible precursor charge (usually always 1)

◆ missed_cleavages_

size_t missed_cleavages_
private

number of missed cleavages

◆ modifications_fixed_

StringList modifications_fixed_
private

Modification that are one all peptides.

◆ modifications_variable_

StringList modifications_variable_
private

Variable Modification -> all possible comibnations are created.

◆ open_precursor_window_lower_

float open_precursor_window_lower_
private

Defines the lower bound of the precursor-mass range.

◆ open_precursor_window_upper_

float open_precursor_window_upper_
private

Defines the upper bound of the precursor-mass range.

◆ peptide_max_length_

size_t peptide_max_length_
private

◆ peptide_max_mass_

float peptide_max_mass_
private

◆ peptide_min_length_

size_t peptide_min_length_
private

◆ peptide_min_mass_

float peptide_min_mass_
private

◆ precursor_mz_tolerance_

float precursor_mz_tolerance_
protected

◆ precursor_mz_tolerance_unit_ppm_

bool precursor_mz_tolerance_unit_ppm_ {true}
protected