grassp.pp.highly_variable_proteins

grassp.pp.highly_variable_proteins#

highly_variable_proteins(data, inplace=True, n_top_proteins=None, flavor='seurat', subset=False, batch_key=None, **kwargs)[source]#

Identify highly variable proteins.

Parameters:
data AnnData

The annotated data matrix with proteins as observations (rows).

inplace bool (default: True)

Whether to store results in data.obs or return them.

n_top_proteins Optional[int] (default: None)

Number of highly-variable proteins to keep. If None, use flavor-specific defaults.

flavor Literal['seurat', 'cell_ranger', 'seurat_v3', 'seurat_v3_paper'] (default: 'seurat')

Method for identifying highly variable proteins. Options are: ‘seurat’ - Seurat’s method (default) ‘cell_ranger’ - Cell Ranger’s method ‘seurat_v3’ - Seurat v3 method ‘seurat_v3_paper’ - Method from Seurat v3 paper

subset bool (default: False)

Whether to subset the data to highly variable proteins.

batch_key Optional[str] (default: None)

If specified, highly-variable proteins are selected within each batch separately.

**kwargs

Additional arguments to pass to scanpy.pp.highly_variable_genes.

Return type:

DataFrame | None

Returns:

pandas.DataFrame or None If inplace=False, returns DataFrame of highly variable proteins. If inplace=True, returns None and stores results in data.obs.

Notes

This function identifies highly variable proteins using methods adapted from single-cell RNA sequencing analysis. The results are stored in data.obs with the following fields:

  • highly_variable: boolean indicator

  • means: mean expression

  • dispersions: dispersion of expression

  • dispersions_norm: normalized dispersion