grassp.tl.calculate_cluster_enrichment

grassp.tl.calculate_cluster_enrichment#

calculate_cluster_enrichment(data, cluster_key='leiden', gene_name_key='Gene_name_canonical', gene_sets='custom_goterms_genes_reviewed.gmt', obs_key_added='Cell_compartment', enrichment_ranking_metric='P-value', return_enrichment_res=True, inplace=True)[source]#

Gene-set enrichment for each cluster.

For every category in data.obs[cluster_key] the function performs an Enrichr analysis via gseapy using the list of proteins (genes) present in that cluster. The most significant term (according to enrichment_ranking_metric) is written back to data.obs under obs_key_added.

Parameters:
data AnnData

Input AnnData with proteins as observations.

cluster_key str (default: 'leiden')

Categorical column in data.obs containing cluster labels.

gene_name_key str (default: 'Gene_name_canonical')

Column in data.obs that holds gene symbols – required by gseapy.

gene_sets str (default: 'custom_goterms_genes_reviewed.gmt')

Gene set database to use for enrichment analysis

obs_key_added str (default: 'Cell_compartment')

Name of the column that will store the top enriched term per cluster.

enrichment_ranking_metric Literal['P-value', 'Odds Ratio', 'Combined Score'] (default: 'P-value')

Column used to rank results within each cluster. Valid options are "P-value", "Odds Ratio" and "Combined Score".

return_enrichment_res bool (default: True)

If True return the full pandas.DataFrame of Enrichr results.

inplace bool (default: True)

If True (default) annotate data in place. Otherwise a modified copy is returned.

Return type:

Union[AnnData, DataFrame, None]

Returns:

Behaviour depends on inplace and return_enrichment_res:

  • inplace=True → annotate data; return the results

    DataFrame if return_enrichment_res else None.

  • inplace=False → return either a new AnnData

    or a (adata, results) tuple.