grassp.pp.aggregate_samples#
- aggregate_samples(data, grouping_columns, agg_func=<function median>, keep_raw=False)[source]#
Aggregates sample expression across samples using a given function.
- Parameters:
- data
AnnData The annotated data matrix with proteins as observations (rows).
- grouping_columns
Union[str,List[str]] Column name(s) in
data.obsto group proteins.- agg_func
Callable[[ndarray,Optional[int]],ndarray] (default:<function median at 0x11a83f8b0>) Function to use for aggregation. Defaults to
np.median.- keep_raw
bool(default:False) Whether to keep the raw data in the returned AnnData object.
- data
- Return type:
- Returns:
A new
AnnDataobject with aggregated expression values. The number of observations (proteins) remains the same, but the number of variables (samples) will correspond to the number of unique groups defined bygrouping_columns.
Notes
This function is useful for combining replicates or creating an averaged profile across conditions. For each sample, it groups the samples based on the provided
grouping_columnsand then aggregates the expression values using the specifiedagg_func.Examples
>>> import grassp as gr >>> adata = gr.datasets.hein_2024(enrichment="raw") >>> adata.shape # Original shape (8538, 183) >>> aggregated = gr.pp.aggregate_samples( ... adata, ... grouping_columns='subcellular_enrichment' ... ) >>> aggregated.shape # Fewer samples after aggregation (8538, 41) >>> int(aggregated.var['n_merged_samples'].max()) # Max samples merged 12