Grassp Centrifugation Workflow Tutorial#

grassp is a python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses). In this tutorial we will be analyzing subcellular proteomics data produced by differential ultracentrifugation (DC).

# Spatial and single cell analysis
import grassp as gr
import scanpy as sc

# Data visualization
import seaborn as sns
import matplotlib.pyplot as plt

# Numerical computing and statistics
import numpy as np

Reading files#

We’ll load the count matrix into an AnnData object, a data structure that provides multiple compartments for storing annotations and various data representations. For a comprehensive tutorial, refer to the Getting Started with AnnData guide.

# Grassp provides methods to read from common proteomics formats in the io module.
# Here we read a MaxQuant output file.
dc = gr.io.read_maxquant(
    "https://public.czbiohub.org/proteinxlocation/internal/proteinGroups.txt",
    intensity_column_prefixes=["LFQ intensity ", "MS/MS count "],
)

dc.var["subcellular_enrichment"] = dc.var_names.str.split("_").str[-1]
dc.var["subcellular_enrichment"] = dc.var["subcellular_enrichment"].replace(
    "cyt", "Cyt"
)

dc.var["biological_replicate"] = dc.var_names.str.split("_").str[0].str[-1]

The centrifugation data in this tutorial comes from the Elias lab at Stanford University (unpublished as of July 2025). Centrifugation-based subcellular fractionation experiments separate cellular components by spinning samples at increasing speeds (1K, 3K, 5K, 12K, 24K, 80K × g) to isolate organelles and subcellular structures based on their density and size, with the cytoplasmic fraction (Cyt) representing the final supernatant.

This approach differs from immunoprecipitation (IP) pull-downs, which use antibodies to specifically capture target proteins and their interacting partners.

Although we are loading in the data from our online data repository, grassp comes with several example datasets. The code lines below shows how to load in these data, including arguments for raw or enriched data.

# example_load_raw = gr.datasets.hek_dc_2025(enrichment="raw")
# example_load_enr = gr.datasets.hek_dc_2025(enrichment="enriched")

dc

AnnData object with n_obs × n_vars = 10224 × 42
    obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'Only identified by site', 'Reverse', 'Potential contaminant', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions'
    var: 'subcellular_enrichment', 'biological_replicate'
    uns: 'RawInfo'
    layers: 'MS_MS count'

Let’s go through the information printed above:

n_obs is the number of “Observations” (i.e. proteins), n_var is the number of variables (i.e. pulldowns/fractions).

AnnData object with n_obs × n_vars = 10224 × 42

Under obs we find the metadata for the proteins. Each entry is a column in a pandas DataFrame.

obs: ‘Protein IDs’, ‘Majority protein IDs’, ‘Peptide counts (all)’, ‘Peptide counts (razor+unique)’, ‘Peptide counts (unique)’, ‘Protein names’, ‘Gene names’, ‘Fasta headers’, ‘Number of proteins’, ‘Peptides’, ‘Razor + unique peptides’, ‘Unique peptides’, ‘Sequence coverage [%]’, ‘Unique + razor sequence coverage [%]’, ‘Unique sequence coverage [%]’, ‘Mol. weight [kDa]’, ‘Sequence length’, ‘Sequence lengths’, ‘Fraction average’, ‘Fraction 1’, ‘Fraction 2’, ‘Fraction 3’, ‘Q-value’, ‘Score’, ‘Intensity’, ‘iBAQ’, ‘MS/MS count’, ‘Only identified by site’, ‘Reverse’, ‘Potential contaminant’, ‘id’, ‘Peptide IDs’, ‘Peptide is razor’, ‘Mod. peptide IDs’, ‘Evidence IDs’, ‘MS/MS IDs’, ‘Best MS/MS’, ‘Oxidation (M) site IDs’, ‘Oxidation (M) site positions’

Under var we find the metadata for the pulldowns/Fractions.

var: ‘subcellular_enrichment’, ‘biological_replicate’

Preprocessing#

Adding Compartment Annotations#

In addition to the bare bones AnnData object, it can be important to add annotations that specify the ground truth subcellular compartments for each sample. These compartment annotations serve as reference labels that define which organelles and cellular structures are expected to be enriched at each centrifugation speed.

annotations = gr.datasets.subcellular_annotations()
annotations.head()

	gene_symbol	hein2024_component	hein2024_gt_component	itzhak2016_component
uniprot_id
Q9NRG9	AAAS	Endoplasmic reticulum	NaN	NaN
Q86V21	AACS	Cytosol	NaN	NaN
Q6PD74	AAGAB	Cytosol	Cytosol	NaN
Q2M2I8	AAK1	Cell membrane	NaN	NaN
Q9H7C9	AAMDC	Cytosol	NaN	NaN

dc.obs = dc.obs.merge(
    annotations, left_on="Gene names", right_on="gene_symbol", how="left"
)
dc.obs_names = dc.obs["Protein IDs"].str.split(";").str[0]
dc.obs[10:20]

	Protein IDs	Majority protein IDs	Peptide counts (all)	Peptide counts (razor+unique)	Peptide counts (unique)	Protein names	Gene names	Fasta headers	Number of proteins	Peptides	...	Mod. peptide IDs	Evidence IDs	MS/MS IDs	Best MS/MS	Oxidation (M) site IDs	Oxidation (M) site positions	gene_symbol	hein2024_component	hein2024_gt_component	itzhak2016_component
Protein IDs
A0AVF1	A0AVF1;A0AVF1-2;A0AVF1-3	A0AVF1;A0AVF1-2;A0AVF1-3	32;27;27	32;27;27	32;27;27	Intraflagellar transport protein 56	TTC26	sp\|A0AVF1\|IFT56_HUMAN Intraflagellar transport...	3	32	...	1887;5089;8906;25551;25552;28390;53785;66607;7...	56848;56849;56850;56851;56852;56853;56854;5685...	52336;52337;52338;52339;52340;52341;52342;5234...	52336;135460;240277;664836;664837;746337;13862...	nan	nan	NaN	NaN	NaN	NaN
A0AVI4	A0AVI4;A0AVI4-2	A0AVI4;A0AVI4-2	7;4	7;4	7;4	E3 ubiquitin-protein ligase TM129	TMEM129	sp\|A0AVI4\|TM129_HUMAN E3 ubiquitin-protein lig...	2	7	...	55660;55824;144788;146788;174433;188073;192255	1583329;1583330;1583331;1583332;1583333;158333...	1438165;1438166;1438167;1438168;1438169;144211...	1438167;1442114;3690573;3732941;4371818;474580...	nan	nan	NaN	NaN	NaN	NaN
A0AVT1	A0AVT1;A0AVT1-2;A0AVT1-4	A0AVT1;A0AVT1-2	64;43;16	64;43;16	45;43;0	Ubiquitin-like modifier-activating enzyme 6	UBA6	sp\|A0AVT1\|UBA6_HUMAN Ubiquitin-like modifier-a...	3	64	...	2362;7748;10112;17608;20090;20341;21159;23811;...	71297;71298;71299;71300;71301;71302;71303;7130...	65760;65761;65762;65763;65764;65765;65766;6576...	65763;206211;273363;469369;530843;536373;55531...	5;6;7;8;9;10	1;56;162;492;844;872	UBA6	Cytosol	NaN	NaN
A0AVT1-3	A0AVT1-3	A0AVT1-3	20	1	1	Ubiquitin-like modifier-activating enzyme 6	UBA6	sp\|A0AVT1-3\|UBA6_HUMAN Isoform 3 of Ubiquitin-...	1	20	...	17608;20341;21159;43342;58254;77570;95017;9639...	511881;511882;511883;511884;511885;511886;5118...	469368;469369;469370;469371;469372;469373;4693...	469369;536373;555312;1132399;1508519;2008975;2...	6;9;10	1;56;162	UBA6	Cytosol	NaN	NaN
A0FGR8-2	A0FGR8-2;A0FGR8;A0FGR8-4;A0FGR8-5	A0FGR8-2;A0FGR8;A0FGR8-4	44;40;24;16	44;40;24;16	4;0;0;0	Extended synaptotagmin-2	ESYT2	sp\|A0FGR8-2\|ESYT2_HUMAN Isoform 2 of Extended ...	4	44	...	1448;2255;2256;2951;8466;13383;24809;31419;492...	44692;44693;44694;44695;44696;44697;44698;4469...	41402;41403;41404;41405;41406;41407;41408;4140...	41407;63394;63408;83030;227479;360884;645585;8...	11;12;13	481;536;662	ESYT2	Endoplasmic reticulum	NaN	NaN
A0FGR8-6	A0FGR8-6	A0FGR8-6	42	2	2	Extended synaptotagmin-2	ESYT2	sp\|A0FGR8-6\|ESYT2_HUMAN Isoform 6 of Extended ...	1	42	...	2255;2256;2951;8466;13383;24809;31419;49245;50...	68611;68612;68613;68614;68615;68616;68617;6861...	63338;63339;63340;63341;63342;63343;63344;6334...	63394;63408;83030;227479;360884;645585;830038;...	11;12;13	509;585;711	ESYT2	Endoplasmic reticulum	NaN	NaN
A0JLT2	A0JLT2;A0JLT2-2	A0JLT2;A0JLT2-2	5;4	5;4	5;4	Mediator of RNA polymerase II transcription su...	MED19	sp\|A0JLT2\|MED19_HUMAN Mediator of RNA polymera...	2	5	...	46142;105894;116488;162058;182947	1323323;1323324;1323325;1323326;1323327;132332...	1205006;1205007;1205008;1205009;1205010;120501...	1205011;2742872;3008261;4065598;4602236	nan	nan	MED19	Nucleus	Nucleus	NaN
A0JNW5	A0JNW5;A0JNW5-2;Q32M92-2;Q32M92	A0JNW5	38;14;1;1	38;14;1;1	37;13;1;1	UHRF1-binding protein 1-like	UHRF1BP1L	sp\|A0JNW5\|UH1BL_HUMAN UHRF1-binding protein 1-...	4	38	...	5379;5943;11319;22940;23422;23604;32645;32756;...	158368;174282;174283;174284;174285;174286;1742...	143702;159031;159032;159033;159034;159035;3057...	143702;159034;305747;599261;612021;616995;8612...	nan	nan	NaN	NaN	NaN	NaN
A0MZ66	A0MZ66;A0MZ66-6;A0MZ66-5;A0MZ66-3;A0MZ66-4;A0M...	A0MZ66;A0MZ66-6;A0MZ66-5;A0MZ66-3;A0MZ66-4;A0M...	40;36;34;34;31;25;24;17	40;36;34;34;31;25;24;17	40;36;34;34;31;25;24;17	Shootin-1	KIAA1598	sp\|A0MZ66\|SHOT1_HUMAN Shootin-1 OS=Homo sapien...	8	40	...	15951;31302;31303;31304;31647;31648;38089;3809...	461921;461922;461923;461924;461925;461926;4619...	423414;423415;423416;423417;423418;423419;4234...	423424;827066;827067;827069;836245;836267;1003...	nan	nan	NaN	NaN	NaN	NaN
A0PJW6	A0PJW6	A0PJW6	7	7	7	Transmembrane protein 223	TMEM223	sp\|A0PJW6\|TM223_HUMAN Transmembrane protein 22...	1	7	...	6157;34709;62696;65715;103175;153339;177885	179936;179937;179938;179939;179940;179941;1799...	164054;164055;164056;164057;164058;164059;1640...	164059;912920;1628502;1704558;2674549;3882081;...	nan	nan	NaN	NaN	NaN	NaN

10 rows × 43 columns

Adding QC metrics to the metadata#

Before performing filtering and transformations, let’s add some quality control metrics of the raw data to the metadata, which we can plot later on.

gr.pp.calculate_qc_metrics(dc)

Filtering#

dc_filtered = dc.copy()

grassp.pp provides filtering functions to remove low-quality proteins. Here, we filter out proteins that were annotated as contaminants by MaxQuant and then remove proteins that were not at least detected in 2/6 fractions for 4/6 replicates.

print("Protein count before filtering: ", dc.shape[0])

contaminant_cols = ["Only identified by site", "Reverse", "Potential contaminant"]
gr.pp.remove_contaminants(dc, filter_columns=contaminant_cols, filter_value="+")
dc_filtered.obs.drop(
    columns=contaminant_cols,
    inplace=True,
)
print("Protein count after contaminant filtering: ", dc.shape[0])

gr.pp.filter_proteins_per_replicate(
    dc_filtered,
    grouping_columns="subcellular_enrichment",
    min_replicates=4,
    min_samples=2,
)
print("Protein count after replicate filtering: ", dc_filtered.shape[0])

Protein count before filtering:  10224
Protein count after contaminant filtering:  9605
Protein count after replicate filtering:  8807

Transformations#

Normalization (log1p transformation)#

Plotting functions like PCA assume normally distributed data, so it’s necessary to apply log transformation to the count data to reduce skewness and stabilize variance across the dynamic range of protein abundances.

dc_filtered.layers["raw_intensities"] = dc_filtered.X.copy()
print(f"DC data before log transforming {dc_filtered.X[:10, :5]}")

dc_filtered.X = np.log1p(dc_filtered.X)
print(f"DC data before imputating {dc_filtered.X[:10, :5]}")
dc_filtered.layers["log_intensities"] = dc_filtered.X.copy()

DC data before log transforming [[7.4430e+07 6.2590e+08 2.6638e+08 1.0273e+07 0.0000e+00]
 [6.6941e+09 3.1782e+10 1.3537e+10 1.6039e+09 1.3146e+08]
 [9.1884e+07 8.7600e+08 3.3295e+08 4.0916e+07 0.0000e+00]
 [8.7687e+07 1.2499e+08 6.8571e+07 2.1352e+08 5.4010e+08]
 [1.7278e+08 5.0152e+07 3.2616e+07 3.2176e+07 5.8874e+07]
 [1.1557e+08 7.7588e+08 3.3768e+08 0.0000e+00 0.0000e+00]
 [3.0980e+07 3.7040e+07 0.0000e+00 2.3960e+07 0.0000e+00]
 [3.4632e+07 0.0000e+00 1.4594e+07 1.4703e+08 3.2308e+08]
 [2.2813e+08 1.8855e+08 2.8589e+08 5.1585e+08 8.2965e+08]
 [0.0000e+00 2.0237e+07 5.8801e+07 5.3140e+07 2.3226e+07]]
DC data before imputating [[18.12537  20.254702 19.400434 16.14503   0.      ]
 [22.624493 24.182165 23.328693 21.195704 18.694214]
 [18.336037 20.590878 19.623503 17.52703   0.      ]
 [18.289284 18.643744 18.04338  19.179241 20.107265]
 [18.96753  17.73057  17.300314 17.286732 17.89091 ]
 [18.565388 20.46951  19.63761   0.        0.      ]
 [17.248852 17.42751   0.       16.991896  0.      ]
 [17.360289  0.       16.49612  18.806147 19.59341 ]
 [19.245426 19.054874 19.471117 20.061327 20.536514]
 [ 0.       16.823023 17.88967  17.78844  16.960783]]

Imputing#

Mass spectrometry data contains numerous missing values due to instrument sensitivity thresholds, where proteins below the detection limit are not quantified.

Imputation addresses these technical limitations by estimating missing values for more comprehensive downstream analysis. In this case, grassp uses a left-shifted gaussian imputation, although other methods can be chosen.

gr.pp.impute_gaussian(dc_filtered, distance=1.8)
print(f"DC data after imputating {dc_filtered.X[:10, :5]}")

DC data after imputating [[18.12537   20.254702  19.400434  16.14503   17.027437 ]
 [22.624493  24.182165  23.328693  21.195704  18.694214 ]
 [18.336037  20.590878  19.623503  17.52703   15.061932 ]
 [18.289284  18.643744  18.04338   19.179241  20.107265 ]
 [18.96753   17.73057   17.300314  17.286732  17.89091  ]
 [18.565388  20.46951   19.63761   15.867263  15.1464205]
 [17.248852  17.42751   16.653584  16.991896  15.292814 ]
 [17.360289  16.801428  16.49612   18.806147  19.59341  ]
 [19.245426  19.054874  19.471117  20.061327  20.536514 ]
 [16.185604  16.823023  17.88967   17.78844   16.960783 ]]

Plotting histogram of data distribution before versus after imputation

plt.hist(
    dc_filtered.X.flatten(), bins=100, alpha=0.5, label="After Imputation"
)  # .flatten() converts the matrix into a 1D array
plt.hist(
    dc_filtered.layers["log_intensities"].flatten(),
    bins=100,
    alpha=0.5,
    label="Before Imputation",
)
plt.legend()

<matplotlib.legend.Legend at 0x1502be9f0>

../../_images/5b0e7788a44d89875cb9284677fa21919f3364855c96e4e82e3eba90ca692b7f.png

QC plotting#

Plotting transposed PCA to check sample clustering#

In subcellular proteomics we focus on the relationships between proteins, which typically lie in our “observations”. However, for QC, one might want to compare samples. Therefore, transposed PCA allows us to visualize sample relationships and ensure that biological replicates cluster together as expected.

dc_T = dc_filtered.T.copy()
dc_T.X = dc_filtered.layers["log_intensities"].T

sc.pp.pca(dc_T)
sc.pl.pca(dc_T, color="subcellular_enrichment", palette="cividis")

../../_images/c9cf96da49ad406fb8fa6805a16b78a14f17be60aa49b0dcc0c40d9101b4c665.png

Violin plots of Log Intensities per Sample#

Plottting the distribution of protein intensities across each sample helps to identify any samples with unusual expression patterns or technical issues. As expected, the later fractions and Cytosolic supernatant have fewer proteins.

plot_df = dc_filtered.to_df(layer="log_intensities")

plt.figure(figsize=(20, 4))
sns.violinplot(plot_df, inner=None)
plt.xticks(rotation=90)
plt.xticks(rotation=90)
plt.tight_layout()  # No fig needed
plt.title(label="Log1p Intensities per Sample")
plt.show()

../../_images/5d9b146a06e34c98ee457f0b5b17c232c064adca47eaa224c04528ce4bab9a89.png

Violin plot of QC metrics per Fraction#

In addition to sample-level qc, we can examine quality control metrics at the fraction level, revealing how protein detection rates, total intensities, and dropout percentages vary across different centrifugation speeds and compartments.

fig, axs = plt.subplots(1, 3, figsize=(28, 6))
for key, ax in zip(
    ["n_proteins_by_intensity", "log1p_total_intensity", "pct_dropout_by_intensity"],
    axs,
):
    sc.pl.violin(
        dc_T,
        key,
        groupby="subcellular_enrichment",
        size=4,
        rotation=90,
        ax=ax,
        show=False,
    )

../../_images/f4350c92d4c5e27ab844063c27b0f7ce8798dafe9383906bf2b45492300605f0.png

Enrichment (Log Fold Change)#

Grassp provides functions to calculate enrichments. The following useful grassp enrichment function offers two enrichment calculation methods:

Log fold change (lfc) computes the difference between median intensities of the target sample versus all other samples in the same condition, providing a measure of how many times higher or lower protein levels are in the enriched fraction
Proportion calculates the relative abundance as a fraction of total intensity across target and control samples, indicating what percentage of the protein’s total signal comes from the enriched condition

Here we chose the lfc transformation over proportions, because it produces a distribution of values that is closer to a normal distribution. As aforementioned, many downstream tools such as PCA assume normally distributed features.

dc_filtered_enr = gr.pp.calculate_enrichment_vs_all(
    dc_filtered,
    subcellular_enrichment_column="subcellular_enrichment",
    covariates=["biological_replicate"],
    enrichment_method="lfc",
)

dc_filtered_enr = gr.pp.aggregate_samples(
    dc_filtered_enr, grouping_columns="subcellular_enrichment", agg_func=np.median
)

dc_filtered_enr

AnnData object with n_obs × n_vars = 8807 × 7
    obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions', 'gene_symbol', 'hein2024_component', 'hein2024_gt_component', 'itzhak2016_component', 'n_samples_by_intensity', 'mean_intensity', 'log1p_mean_intensity', 'pct_dropout_by_intensity', 'total_intensity', 'log1p_total_intensity'
    var: 'subcellular_enrichment', 'n_merged_samples'
    uns: 'RawInfo'
    layers: 'MS_MS count', 'raw_intensities', 'log_intensities', 'original_intensities', 'pvals'

Dimensionality Reduction#

PCA plots#

Having filtered, transformed, and enriched, we can move onto visualization and interpretation! These plots show how proteins cluster in reduced dimensional space based on their intensity patterns across samples, revealing groups of co-localized proteins and identifying potential subcellular localization signatures.

sc.pp.scale(dc_filtered_enr)
sc.pp.pca(dc_filtered_enr)
sc.pl.pca(
    dc_filtered_enr,
    color="hein2024_gt_component",
    title="DC hein 2024 Ground Truth PCA",
)
sc.pl.pca(
    dc_filtered_enr, color="hein2024_component", title="DC hein 2024 annotated PCA"
)

../../_images/93f7644d04cb09ce711106ea2df58376e77e9666c18cac229567ae88483e3c8e.png

../../_images/c74fddb97c2304cd637b776099c0f1382cb6e274cd1beb1d9916785752e56db4.png

UMAPs#

While PCA provides a linear dimensionality reduction, UMAP offers a non-linear approach that can better preserve local neighborhood structures and reveal more complex patterns in protein localization data that might be missed by linear methods.

sc.pp.neighbors(dc_filtered_enr, use_rep="X", n_neighbors=20)
sc.tl.umap(dc_filtered_enr)
sc.pl.umap(
    dc_filtered_enr,
    color="hein2024_gt_component",
    title="DC hein 2024 Ground Truth UMAP",
)
sc.pl.umap(
    dc_filtered_enr, color="hein2024_component", title="DC hein 2024 annotated UMAP"
)

../../_images/ac4f556c154bef09a5760f0978620bebcfe632c154d7a07a96e4d9ea45429920.png

../../_images/6932c1ccde2da9a9f0956a55046965463fa21dfbfec1a550ad2e147b86e38f50.png

Compartment Annotation#

The central question of subcellular proteomics is to find which cellular compartment each observed protein resides in. One way to annotate proteins with their compartments is to start from a set of ground-truth proteins with known localization and transfer labels to proteins with similar subcellular profiles. For this grassp provides the knn_annotation function, that propagates labels across local neighborhoods in the protein-protein neighbor graph.

gr.tl.knn_annotation(
    dc_filtered_enr, obs_ann_col="hein2024_gt_component", key_added="knn_annotation"
)

sc.pl.umap(dc_filtered_enr, color="knn_annotation", title="KNN Annotation")

../../_images/8b913c2f96f0df93b1bc6045999233a321c2ec99e8aaba3d929b30fb568ea16d.png

After these steps, you will see that new analysis results are stored in various AnnData compartments: PCA components and UMAP coordinates are saved in .obsm, while metadata like search engine parameters and visualization settings are stored in .uns, and protein-protein relationships are captured in .obsp as distance and connectivity matrices.

uns: ‘Search_Engine’, ‘pca’, ‘hein2024_gt_component_colors’, ‘hein2024_component_colors’, ‘neighbors’, ‘umap’

obsm: ‘X_pca’, ‘X_umap’

obsp: ‘distances’, ‘connectivities’

dc_filtered_enr

AnnData object with n_obs × n_vars = 8807 × 7
    obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions', 'gene_symbol', 'hein2024_component', 'hein2024_gt_component', 'itzhak2016_component', 'n_samples_by_intensity', 'mean_intensity', 'log1p_mean_intensity', 'pct_dropout_by_intensity', 'total_intensity', 'log1p_total_intensity', 'knn_annotation'
    var: 'subcellular_enrichment', 'n_merged_samples', 'mean', 'std'
    uns: 'RawInfo', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap', 'knn_annotation_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'MS_MS count', 'raw_intensities', 'log_intensities', 'original_intensities', 'pvals'
    obsp: 'distances', 'connectivities'

Grassp Centrifugation Workflow Tutorial

Contents