scatlastb_utils.pp.pseudobulk

Contents

scatlastb_utils.pp.pseudobulk#

scatlastb_utils.pp.pseudobulk(adata, group_key, agg='sum', sep='--', group_cols=None, layer=None, min_cells=2, force_sparse=True, dtype='float32', use_legacy=False, **kwargs)#

Aggregate an AnnData object into pseudobulk samples.

Parameters:
  • adata (AnnData) – Input annotated data matrix (function works on a copy).

  • group_key (str | Sequence[str]) – Column name(s) in adata.obs to group by. If a sequence is provided the keys are concatenated with sep to form a single group label column.

  • agg (str (default: 'sum')) – Aggregation function name forwarded to scanpy.get.aggregate (common values: "sum", "mean").

  • sep (str (default: '--')) – Separator used when joining multiple group keys.

  • group_cols (list-like, optional) – Which adata.obs columns to preserve/aggregate. Defaults to all columns.

  • layer (str | None (default: None)) – Name of the layer to use for the expression matrix. If None uses adata.X. When provided, both legacy and modern paths prefer the named layer for aggregation.

  • min_cells (int (default: 2)) – Minimum number of cells required for a group to be kept.

  • force_sparse (bool (default: True)) – If True, attempt to return the aggregated matrix in a sparse representation when appropriate.

  • dtype (str | dtype (default: 'float32')) – Data-type to use for aggregation results.

  • use_legacy (bool (default: False)) – Force the legacy dask-backed aggregation path when True.

  • **kwargs – Forwarded to scanpy.get.aggregate.

Return type:

AnnData

Returns:

AnnData New AnnData whose X contains the aggregated matrix (groups x features) and whose obs contains aggregated metadata (including an n_agg column with per-group cell counts).

Notes

The function filters out groups with fewer than min_cells before aggregation. If multiple group_key values are provided they are joined using sep and the combined label is used for grouping.