grassp.tl.svm_annotation

Contents

grassp.tl.svm_annotation#

svm_annotation(data, gt_col='markers', C=None, gamma=None, fix_markers=False, min_probability=0.5, inplace=True, key_added='svm_annotation', params_key=None, class_weight='balanced')[source]#

Classify proteins using SVM with marker-based training.

Trains an SVM classifier on marker proteins (non-NaN values in gt_col) and predicts localization for all proteins. Hyperparameters can be provided manually or loaded from prior svm_train() call.

Similar to knn_annotation() but uses SVM instead of graph propagation.

Parameters:
data AnnData

anndata.AnnData with feature matrix in .X.

gt_col str (default: 'markers')

Observation column with marker labels (NaN for unknowns).

C float | None (default: None)

SVM regularization parameter. If None, loads from .uns.

gamma float | str | None (default: None)

RBF kernel coefficient. If None, loads from .uns.

fix_markers bool (default: False)

If True marker proteins retain their original labels with probability 1.0.

min_probability float (default: 0.5)

Confidence threshold; predictions below this are set to NaN.

inplace bool (default: True)

If True modify data in place; else return dict.

key_added str (default: 'svm_annotation')

Base name for results (default "svm_annotation").

params_key str | None (default: None)

Key to load hyperparameters from .uns (default "svm.params").

class_weight None | dict | Literal['balanced']

Return type:

dict | None

Returns:

None or dict If inplace=True, modifies data with:

  • .obs[f"{key_added}"]: Predicted labels

  • .obs[f"{key_added}_probability"]: Max probability per protein

  • .obsm[f"{key_added}_probabilities"]: Full probability matrix

  • .uns[f"{key_added}_colors"]: Color scheme (copied from gt_col)

If inplace=False, returns dict with predictions and probabilities.

Raises:
  • ValueError – If no hyperparameters found and none provided manually.

  • KeyError – If gt_col not found in .obs.

Examples

>>> import grassp as gr
>>> import scanpy as sc
>>> adata = gr.ds.hein_2024(enrichment="enriched")

##### Option 1: Annotate directly, with fixed hyperparameters ##### >>> gr.tl.svm_annotation( … adata, … gt_col=”hein2024_gt_component”, … min_probability=0.5, … C=10, … gamma=0.01, … ) >>> sc.pl.umap(adata, color=”svm_annotation”) # doctest: +SKIP

##### Option 2: Train SVM hyperparameters, then annotate ##### # When actually training, increase cv_repeats and cv_splits # We recommend >20 repeats with 5 splits >>> gr.tl.svm_train(adata, gt_col=”hein2024_gt_component”, cv_repeats=2, cv_splits=2, random_state=42) Fitting 4 folds for each of 54 candidates, totalling 216 fits >>> adata.uns[“svm.params”][“best_params”] {‘C’: 2.0, ‘gamma’: 0.01} >>> gr.tl.svm_annotation(adata, gt_col=”hein2024_gt_component”, min_probability=0.5) >>> sc.pl.umap(adata, color=”svm_annotation”) # doctest: +SKIP