grassp.pp.filter_proteins_per_replicate#

filter_proteins_per_replicate(data, grouping_columns, min_replicates=1, min_samples=1, inplace=True)[source]#

Filter proteins based on detection in replicates.

Parameters:

data AnnData: The annotated data matrix with proteins as observations (rows).
grouping_columns Union[str, List[str]]: Column name(s) in data.var to group samples into replicates. Note: Typically the grouping columns will not be the column with the replicate information, but rather the columns with the sample (IP/fraction) information. Samples that are grouped by these columns will be considered replicates.
min_replicates int (default: 1): Minimum number of replicates a protein must be detected in to pass filtering.
min_samples int (default: 1): Minimum number of sample groups a protein must be detected in to pass filtering.
inplace bool (default: True): Whether to modify data in place or return a copy.

Return type:

ndarray | None

Returns:

numpy.ndarray or None

If inplace=False, returns boolean mask indicating which proteins passed filtering.
If inplace=True, returns None and modifies input data.

Notes

This function filters proteins based on their detection pattern across replicates. For each group of samples (defined by grouping_columns), it requires proteins to be detected in at least min_replicates samples. The protein must pass this threshold in at least min_samples groups to be kept.

Examples

>>> import grassp as gr
>>> adata = gr.datasets.hein_2024(enrichment="raw")
>>> adata.shape
(8538, 183)
>>> gr.pp.filter_proteins_per_replicate(
...     adata,
...     grouping_columns='subcellular_enrichment',
...     min_replicates=2,
...     min_samples=3
... )
>>> adata.shape  # Fewer proteins after filtering
(7869, 183)

grassp.pp.filter_proteins_per_replicate

Contents

grassp.pp.filter_proteins_per_replicate#