grassp.pp.impute_gaussian#
- impute_gaussian(data, width=0.3, distance=1.8, per_sample=True, random_state=0, inplace=True)[source]#
Impute missing values using a Gaussian distribution.
This function imputes missing values (zeros) in the data matrix using a Gaussian distribution. The parameters of the Gaussian are derived from the observed (non-zero) values, with the mean shifted downward by a specified number of standard deviations.
- Parameters:
- data
AnnData
Annotated data matrix with proteins as observations (rows).
- width
float
(default:0.3
) Width of the Gaussian distribution used for imputation, as a fraction of the standard deviation of observed values. Default is 0.3.
- distance
float
(default:1.8
) Number of standard deviations below the mean of observed values to center the imputation distribution. Default is 1.8.
- per_sample
bool
(default:True
) If True, calculate parameters separately for each sample (column). If False, use global parameters. Default is True.
- random_state
Union
[int
,RandomState
,None
] (default:0
) Seed for random number generation. Default is 0.
- inplace
bool
(default:True
) If True, modify data in place. If False, return a copy. Default is True.
- data
- Return type:
ndarray
|None
- Returns:
numpy.ndarray or None If inplace=False, returns the imputed data matrix. If inplace=True, returns None and modifies the input data.
Notes
This implements a simple but effective imputation strategy commonly used in proteomics data analysis. Missing values are assumed to be below detection limit and are imputed from a Gaussian distribution with parameters derived from the observed values but shifted downward.