grassp.tl.calculate_interfacialness_score

grassp.tl.calculate_interfacialness_score#

calculate_interfacialness_score(data, compartment_annotation_column, neighbors_key=None, obsp=None, exclude_category=None)[source]#

Quantify interfacialness of proteins across compartment boundaries.

The score is based on a modified Jaccard index computed from each protein’s immediate neighbourhood:

  1. For a given protein count how many of its neighbours belong to each compartment (categories in compartment_annotation_column).

  2. Sort counts and take the two highest: d1 and d2 for compartments k1 and k2.

  3. Compute

    score = (d1 + d2) / (N_k1 + N_k2 - (d1 + d2))

    where N_k is the total number of proteins annotated as compartment k in the dataset. High scores indicate that a protein sits at an interface between two compartments.

New columns are appended to data.obs with the jaccard_ prefix.

Parameters:
data AnnData

anndata.AnnData with a neighbour graph and compartment annotations for each protein.

compartment_annotation_column str

Observation column containing the ground-truth compartment labels.

neighbors_key str | None (default: None)

Specify which neighbour graph to use (mirrors Scanpy conventions).

obsp str | None (default: None)

Specify which neighbour graph to use (mirrors Scanpy conventions).

exclude_category Union[str, List[str], None] (default: None)

One or multiple category labels (e.g. ‘unknown’) to ignore when counting neighbours.

Return type:

AnnData

Returns:

class:’~anndata.AnnData` object with additional columns:

jaccard_score

Interfacialness score.

jaccard_d1, jaccard_d2

Counts of the two dominating neighbour compartments.

jaccard_k1, jaccard_k2

Corresponding compartment labels.