grassp.pp.add_markers

Contents

grassp.pp.add_markers#

add_markers(data, species, authors=None, uniprot_id_column=None, add_colors=True)[source]#

Annotate proteins with marker annotations from literature.

Matches protein IDs in .obs against a collection of marker annotations from different authors. Note that marker IDs are species-specific and may not be UniProt accessions (see table below).

Marker annotations are sourced from:

authors

source

hein2024_gt_component

Marker list used in Hein et al. 2024, Cell, https://doi.org/10.1016/j.cell.2024.11.028

hein2024_component

Full annotations from Hein et al. 2024, Cell, https://doi.org/10.1016/j.cell.2024.11.028

lilley, christopher, geladaki, itzhak, villaneuva, christoforou

Obtained from pRoloc. See: https://bioconductor.org/packages/pRoloc/ and https://lgatto.github.io/pRoloc/reference/pRolocmarkers.html

Protein ID types by species:

Species Code

Common Name

ID Type

Example ID

atha

Arabidopsis thaliana

TAIR/Araport

AT1G01620

dmel

Drosophila melanogaster

UniProt

A1Z6P3

ggal

Gallus gallus (Chicken)

IPI

IPI00570752.1

hsap

Homo sapiens (Human)

UniProt

A0AVT1

mmus

Mus musculus (Mouse)

UniProt

A2AJ15

scer

Saccharomyces cerevisiae (Yeast)

UniProt

D6VTK4

toxo

Toxoplasma gondii

ToxoDB Gene IDs

TGME49_200250

tryp

Trypanosoma brucei

TriTrypDB Gene IDs

Tb11.v5.0162

This function modifies the AnnData object in-place by adding marker annotation columns to .obs.

Parameters:
data AnnData

AnnData object.

species str

Species code to determine which marker file to read. Examples: ‘hsap’ (human), ‘mmus’ (mouse), ‘scer’ (yeast), ‘atha’ (Arabidopsis), ‘dmel’ (fly), ‘toxo’ (Toxoplasma), ‘tryp’ (Trypanosoma), ‘ggal’ (chicken).

authors list[str] | str | None (default: None)

Specific author column(s) to include from the marker file. If None, includes all available author columns. Can be a single author name (string) or a list of author names.

uniprot_id_column str | None (default: None)

Column in .obs containing protein IDs (see the specific ID needed in the description above). If None, uses .obs_names.

add_colors bool (default: True)

If True, automatically add color mappings to .uns for each marker column, following scanpy plotting conventions. Colors are stored as '{author}_colors' lists matching categorical order.

Return type:

None

Returns:

None Modifies data.obs in-place by adding marker annotation columns (converted to categorical dtype). If add_colors=True, also adds color mappings to data.uns as '{author}_colors' lists.

Examples

>>> import grassp as gr
>>> import pandas as pd
>>> adata = gr.datasets.hein_2024(enrichment='raw')
>>> # Add specific author annotations
>>> gr.pp.add_markers(adata, species='hsap', authors=['christopher'])
Added christopher annotations for ...
>>> # Check categorical dtype and colors
>>> isinstance(adata.obs['christopher'].dtype, pd.CategoricalDtype)
True
>>> 'christopher_colors' in adata.uns
True
>>> # Disable automatic color mapping
>>> gr.pp.add_markers(adata, species='hsap', authors=['lilley'], add_colors=False)
Added lilley annotations for ...