grassp.pp.calculate_qc_metrics

grassp.pp.calculate_qc_metrics#

calculate_qc_metrics(data, qc_vars=(), percent_top=(50, 100, 200, 500), layer=None, use_raw=False, inplace=True, log1p=True, var_type='proteins', expr_type='intensity', parallel=None)[source]#

Calculate quality control metrics.

Parameters:
data AnnData

The annotated data matrix with proteins as observations (rows).

qc_vars Union[Collection[str], str] (default: ())

Keys for boolean columns in .var that indicate a protein is a quality control protein.

percent_top Optional[Collection[int]] (default: (50, 100, 200, 500))

Which proportions of top proteins to compute as QC metrics. Set to None to disable.

layer Optional[str] (default: None)

If provided, use data.layers[layer] for expression values.

use_raw bool (default: False)

If True, use data.raw for expression values.

inplace bool (default: True)

Whether to add metrics to input object or return them.

log1p bool (default: True)

If True, compute log1p of expression values.

var_type str (default: 'proteins')

Name for variables (e.g. ‘proteins’, ‘genes’, etc).

expr_type str (default: 'intensity')

Name for expression values (e.g. ‘intensity’, ‘counts’, etc).

parallel Optional[bool] (default: None)

Whether to parallelize computation.

Return type:

tuple[DataFrame, DataFrame] | None

Returns:

If not inplace, returns a tuple containing:
  • A DataFrame with protein-based metrics (var)

  • A DataFrame with sample-based metrics (obs)

If inplace, returns None and adds metrics to the input object.

Notes

Calculates quality control metrics for both proteins and samples, including:
  • Number of samples expressing each protein

  • Total intensity per sample

  • Number of proteins detected per sample

  • Percentage of intensity from top proteins