grassp.pp.remove_cRAP_proteins

grassp.pp.remove_cRAP_proteins#

remove_cRAP_proteins(data, id_column=None, id_type='uniprot', inplace=True, verbose=True)[source]#

Remove cRAP (common Repository of Adventitious Proteins) contaminants.

This function removes common laboratory contaminants from proteomics datasets using the cRAP database maintained at https://ftp.thegpm.org/fasta/crap/. Protein IDs are matched against the cRAP database, with support for both UniProt accession IDs (e.g., P00330) and entry names (e.g., ADH1_YEAST).

Parameters:
data AnnData

The annotated data matrix with proteins as observations (rows).

id_column str | None (default: None)

Column name in data.obs containing protein IDs to match against cRAP database. If None, uses data.obs_names (row index).

id_type Literal['uniprot', 'uniprot_entry_name'] (default: 'uniprot')

Type of protein identifier to match: - ‘uniprot’: UniProt accession IDs (e.g., P00330) - ‘uniprot_entry_name’: UniProt entry names (e.g., ADH1_YEAST)

inplace bool (default: True)

Whether to modify data in place or return a copy.

verbose bool (default: True)

If True, print the list of removed protein IDs. Default is True.

Return type:

AnnData | None

Returns:

  • If inplace=False, returns filtered data with cRAP proteins removed.

  • If inplace=True, modifies data in place and returns None.

Notes

  • Protein IDs with isoform suffixes (e.g., P00330-1) are automatically cleaned to base accession (P00330) before matching.

  • If no cRAP proteins are found in the dataset, a warning is issued but the function completes successfully.

  • The cRAP database is included with grassp. To update it, run: python -m grassp.datasets.marker_curation.update_cRAP

See also

remove_contaminants

Remove contaminants based on custom filter columns.

Examples

Remove cRAP proteins using UniProt IDs from row index:

>>> import grassp as gr
>>> adata = gr.datasets.hein_2024(enrichment="raw")
>>> adata.shape
(8538, 183)
>>> gr.pp.remove_cRAP_proteins(adata)
>>> adata.shape  # Some cRAP proteins removed
(8520, 183)