grassp.tl.calculate_cluster_enrichment#

calculate_cluster_enrichment(data, cluster_key='leiden', gene_name_key='Gene_name_canonical', gene_sets='custom_goterms_genes_reviewed.gmt', obs_key_added='Cell_compartment', enrichment_ranking_metric='P-value', return_enrichment_res=True, inplace=True)[source]#

Gene-set enrichment for each cluster.

For every category in data.obs[cluster_key] the function performs an Enrichr analysis via gseapy using the list of proteins (genes) present in that cluster. The most significant term (according to enrichment_ranking_metric) is written back to data.obs under obs_key_added.

Parameters:

data AnnData: Input AnnData with proteins as observations.
cluster_key str (default: 'leiden'): Categorical column in data.obs containing cluster labels.
gene_name_key str (default: 'Gene_name_canonical'): Column in data.obs that holds gene symbols – required by gseapy.
gene_sets str (default: 'custom_goterms_genes_reviewed.gmt'): Gene set database to use for enrichment analysis
obs_key_added str (default: 'Cell_compartment'): Name of the column that will store the top enriched term per cluster.
enrichment_ranking_metric Literal['P-value', 'Odds Ratio', 'Combined Score'] (default: 'P-value'): Column used to rank results within each cluster. Valid options are "P-value", "Odds Ratio" and "Combined Score".
return_enrichment_res bool (default: True): If True return the full pandas.DataFrame of Enrichr results.
inplace bool (default: True): If True (default) annotate data in place. Otherwise a modified copy is returned.

Return type:

Union[AnnData, DataFrame, None]

Returns:

Behaviour depends on inplace and return_enrichment_res:

inplace=True → annotate data; return the results
DataFrame if return_enrichment_res else None.
inplace=False → return either a new AnnData
or a (adata, results) tuple.

grassp.tl.calculate_cluster_enrichment

Contents

grassp.tl.calculate_cluster_enrichment#