grassp.tl.calculate_cluster_enrichment#
- calculate_cluster_enrichment(data, cluster_key='leiden', gene_name_key='Gene_name_canonical', gene_sets='custom_goterms_genes_reviewed.gmt', obs_key_added='Cell_compartment', enrichment_ranking_metric='P-value', return_enrichment_res=True, inplace=True)[source]#
Gene-set enrichment for each cluster.
For every category in
data.obs[cluster_key]
the function performs an Enrichr analysis viagseapy
using the list of proteins (genes) present in that cluster. The most significant term (according toenrichment_ranking_metric
) is written back todata.obs
underobs_key_added
.- Parameters:
- data
AnnData
Input
AnnData
with proteins as observations.- cluster_key
str
(default:'leiden'
) Categorical column in
data.obs
containing cluster labels.- gene_name_key
str
(default:'Gene_name_canonical'
) Column in
data.obs
that holds gene symbols – required by gseapy.- gene_sets
str
(default:'custom_goterms_genes_reviewed.gmt'
) Gene set database to use for enrichment analysis
- obs_key_added
str
(default:'Cell_compartment'
) Name of the column that will store the top enriched term per cluster.
- enrichment_ranking_metric
Literal
['P-value'
,'Odds Ratio'
,'Combined Score'
] (default:'P-value'
) Column used to rank results within each cluster. Valid options are
"P-value"
,"Odds Ratio"
and"Combined Score"
.- return_enrichment_res
bool
(default:True
) If
True
return the fullpandas.DataFrame
of Enrichr results.- inplace
bool
(default:True
) If
True
(default) annotate data in place. Otherwise a modified copy is returned.
- data
- Return type:
- Returns:
Behaviour depends on
inplace
andreturn_enrichment_res
:inplace=True
→ annotate data; return the resultsDataFrame if
return_enrichment_res
elseNone
.
inplace=False
→ return either a newAnnData
or a
(adata, results)
tuple.