grassp.pp.calculate_enrichment_vs_untagged

grassp.pp.calculate_enrichment_vs_untagged#

calculate_enrichment_vs_untagged(data, covariates=None, subcellular_enrichment_column='subcellular_enrichment', untagged_name='UNTAGGED', original_intensities_key=None, drop_untagged=True, keep_raw=True)[source]#

Calculates enrichment scores and p-values by comparing tagged samples against untagged controls.

This function performs a t-test to determine the significance of protein enrichment in tagged samples relative to untagged controls. The enrichment is calculated as the log2 fold change of median intensities.

Parameters:
data AnnData

An AnnData object with protein intensities in .X.

covariates Optional[Sequence[str]] (default: None)

A list of column names in data.var to group samples. If None, columns starting with covariate_ are used.

subcellular_enrichment_column str (default: 'subcellular_enrichment')

The column in .var that contains subcellular enrichment labels.

untagged_name str (default: 'UNTAGGED')

The label in subcellular_enrichment_column that identifies untagged control samples.

original_intensities_key Optional[str] (default: None)

If specified, the original intensity values are stored in data.layers[original_intensities_key].

drop_untagged bool (default: True)

If True, untagged samples are removed from the returned AnnData object.

keep_raw bool (default: True)

If True, the original unaggregated data is stored in .raw.

Return type:

AnnData

Returns:

AnnData Aggregated AnnData object with enrichment scores and p-values, with:

  • .X: log2 fold changes relative to untagged controls.

  • .layers["pvals"]: p-values from the t-tests.

  • .layers[original_intensities_key]: raw intensity values if original_intensities_key is set.