scatlastb_utils.pp.pseudobulk#
- scatlastb_utils.pp.pseudobulk(adata, group_key, agg='sum', sep='--', group_cols=None, layer=None, min_cells=2, force_sparse=True, dtype='float32', use_legacy=False, **kwargs)#
Aggregate an
AnnDataobject into pseudobulk samples.- Parameters:
adata (
AnnData) – Input annotated data matrix (function works on a copy).group_key (
str|Sequence[str]) – Column name(s) inadata.obsto group by. If a sequence is provided the keys are concatenated withsepto form a single group label column.agg (
str(default:'sum')) – Aggregation function name forwarded toscanpy.get.aggregate(common values:"sum","mean").sep (
str(default:'--')) – Separator used when joining multiple group keys.group_cols (list-like, optional) – Which
adata.obscolumns to preserve/aggregate. Defaults to all columns.layer (
str|None(default:None)) – Name of the layer to use for the expression matrix. IfNoneusesadata.X. When provided, both legacy and modern paths prefer the named layer for aggregation.min_cells (
int(default:2)) – Minimum number of cells required for a group to be kept.force_sparse (
bool(default:True)) – If True, attempt to return the aggregated matrix in a sparse representation when appropriate.dtype (
str|dtype(default:'float32')) – Data-type to use for aggregation results.use_legacy (
bool(default:False)) – Force the legacy dask-backed aggregation path when True.**kwargs – Forwarded to
scanpy.get.aggregate.
- Return type:
- Returns:
AnnData New
AnnDatawhoseXcontains the aggregated matrix (groups x features) and whoseobscontains aggregated metadata (including ann_aggcolumn with per-group cell counts).
Notes
The function filters out groups with fewer than
min_cellsbefore aggregation. If multiplegroup_keyvalues are provided they are joined usingsepand the combined label is used for grouping.