grassp.pp.remove_cRAP_proteins#

remove_cRAP_proteins(data, id_column=None, id_type='uniprot', inplace=True, verbose=True)[source]#

Remove cRAP (common Repository of Adventitious Proteins) contaminants.

This function removes common laboratory contaminants from proteomics datasets using the cRAP database maintained at https://ftp.thegpm.org/fasta/crap/. Protein IDs are matched against the cRAP database, with support for both UniProt accession IDs (e.g., P00330) and entry names (e.g., ADH1_YEAST).

Parameters:

data AnnData: The annotated data matrix with proteins as observations (rows).
id_column str | None (default: None): Column name in data.obs containing protein IDs to match against cRAP database. If None, uses data.obs_names (row index).
id_type Literal['uniprot', 'uniprot_entry_name'] (default: 'uniprot'): Type of protein identifier to match: - ‘uniprot’: UniProt accession IDs (e.g., P00330) - ‘uniprot_entry_name’: UniProt entry names (e.g., ADH1_YEAST)
inplace bool (default: True): Whether to modify data in place or return a copy.
verbose bool (default: True): If True, print the list of removed protein IDs. Default is True.

Return type:

AnnData | None

Returns:

If inplace=False, returns filtered data with cRAP proteins removed.
If inplace=True, modifies data in place and returns None.

Notes

Protein IDs with isoform suffixes (e.g., P00330-1) are automatically cleaned to base accession (P00330) before matching.
If no cRAP proteins are found in the dataset, a warning is issued but the function completes successfully.
The cRAP database is included with grassp. To update it, run: python -m grassp.datasets.marker_curation.update_cRAP

grassp.pp.remove_cRAP_proteins

Contents

grassp.pp.remove_cRAP_proteins#