scatlastb_utils.pipeline.ModuleConfig.ModuleConfig#

class scatlastb_utils.pipeline.ModuleConfig.ModuleConfig(module_name, config, parameters=None, default_target=None, wildcard_names=None, mandatory_wildcards=None, config_params=None, rename_config_params=None, explode_by=None, paramspace_kwargs=None, dont_inherit=None, dtypes=None, write_output_files=True, warn=False)#

ModuleConfig class to handle module configuration and parameters.

This class is designed to encapsulate the configuration for a specific module in a Snakemake pipeline, including input files, parameters, and output targets. It provides methods to retrieve and manipulate these configurations, as well as to write output files based on the module’s parameters and wildcards.

The main use of this class is to initialise the input files, parameters, and any configurations by the user, and to provide a structured way to access and manipulate these configurations within the scAtlasTb Snakemake workflow.

Parameters:
  • module_name (str)

  • config (dict)

  • parameters ([pd.DataFrame, str])

  • default_target ([str, Any])

  • wildcard_names (list)

  • mandatory_wildcards (list)

  • config_params (list)

  • rename_config_params (dict)

  • explode_by ([str, list])

  • paramspace_kwargs (dict)

  • dont_inherit (list)

  • dtypes (dict)

  • write_output_files (bool)

  • warn (bool)

Methods table#

copy()

Create a copy of the ModuleConfig instance.

get_config()

Get complete config with parsed input files.

get_datasets()

Get config for datasets that use the module.

get_defaults([module_name])

Get defaults for module.

get_for_dataset(dataset, query[, default, warn])

Get any key from the config via query

get_from_parameters(query_dict, ...)

Retrieve a specific parameter from the parameters DataFrame.

get_input_file(dataset, file_id, **kwargs)

Retrieve a specific input file for a dataset.

get_input_file_wildcards()

Retrieve input file wildcards for the module.

get_input_files()

Retrieve all input files for the module.

get_input_files_per_dataset(dataset)

Retrieve input files for a specific dataset.

get_output_files([pattern, extra_wildcards, ...])

Get output file based on wildcards

get_parameters()

Retrieve the parameters DataFrame for the module.

get_paramspace(**kwargs)

Retrieve the parameter space for the module.

get_profile(wildcards)

Get the resource profile for the given wildcards.

get_resource(resource_key[, profile, ...])

Retrieve resource information from config['resources']

get_wildcard_names()

Retrieve wildcard names for the module.

get_wildcards(**kwargs)

Retrieve wildcard instances as dictionary

set_datasets([warn])

Set dataset configs

set_default_target([default_target, warn])

Set the default target for the module.

set_defaults([warn])

Set default entries for the module in the config.

set_defaults_per_dataset(dataset[, warn])

Set default entries for a specific dataset in the config.

update_inputs(dataset, input_files)

Update input files for a specific dataset.

update_parameters([wildcards_df, ...])

Update parameters and all dependent attributes

write_output_files()

Write output files mapping to a TSV file in the output directory.

Methods#

ModuleConfig.copy()#

Create a copy of the ModuleConfig instance.

ModuleConfig.get_config()#

Get complete config with parsed input files.

Return type:

dict

ModuleConfig.get_datasets()#

Get config for datasets that use the module.

Return type:

dict

ModuleConfig.get_defaults(module_name=None)#

Get defaults for module.

Parameters:

module_name (str)

ModuleConfig.get_for_dataset(dataset, query, default=None, warn=False)#

Get any key from the config via query

Args:

dataset (str): dataset key in config[‘DATASETS’] query (list): list of keys to walk down the config default (Union[str,bool,float,int,dict,list, None], optional): default value if key not found. Defaults to None.

Return type:

str | bool | float | int | dict | list | None

Returns:

Union[str,bool,float,int,dict,list, None]: value of query in config

Parameters:
ModuleConfig.get_from_parameters(query_dict, parameter_key, **kwargs)#

Retrieve a specific parameter from the parameters DataFrame.

Parameters:
  • query_dict ([<class 'dict'>, Any])

  • parameter_key (str)

ModuleConfig.get_input_file(dataset, file_id, **kwargs)#

Retrieve a specific input file for a dataset.

ModuleConfig.get_input_file_wildcards()#

Retrieve input file wildcards for the module.

ModuleConfig.get_input_files()#

Retrieve all input files for the module.

ModuleConfig.get_input_files_per_dataset(dataset)#

Retrieve input files for a specific dataset.

ModuleConfig.get_output_files(pattern=None, extra_wildcards=None, allow_missing=False, as_dict=False, as_records=False, return_wildcards=False, verbose=False, **kwargs)#

Get output file based on wildcards

Parameters:
  • pattern ([<class ‘str’>, typing.Any] (default: None)) – output pattern, defaults to self.default_target

  • kwargs – arguments passed to WildcardParameters.get_wildcards

  • extra_wildcards (dict)

  • allow_missing (bool)

  • as_dict (bool)

  • as_records (bool)

  • return_wildcards (bool)

  • verbose (bool)

Return type:

list

ModuleConfig.get_parameters()#

Retrieve the parameters DataFrame for the module.

ModuleConfig.get_paramspace(**kwargs)#

Retrieve the parameter space for the module.

ModuleConfig.get_profile(wildcards)#

Get the resource profile for the given wildcards.

Parameters:

wildcards ([<class 'dict'>, Any])

ModuleConfig.get_resource(resource_key, profile='cpu', attempt=1, attempt_to_cpu=1, factor=0.5, verbose=False)#

Retrieve resource information from config[‘resources’]

Parameters:
  • profile (str (default: 'cpu')) – resource profile, key under config[‘resources’]

  • resource_key (str) – resource key, key under config[‘resources’][profile]

  • attempt (int)

  • attempt_to_cpu (int)

  • factor (float)

  • verbose (bool)

Return type:

[<class ‘str’>, <class ‘int’>, <class ‘float’>]

ModuleConfig.get_wildcard_names()#

Retrieve wildcard names for the module.

ModuleConfig.get_wildcards(**kwargs)#

Retrieve wildcard instances as dictionary

Parameters:

kwargs – arguments passed to WildcardParameters.get_wildcards

Return type:

[<class ‘dict’>, <class ‘pandas.core.frame.DataFrame’>]

Returns:

dictionary of wildcards that can be applied directly for expanding target files

ModuleConfig.set_datasets(warn=False)#

Set dataset configs

Parameters:

warn (bool)

ModuleConfig.set_default_target(default_target=None, warn=False)#

Set the default target for the module.

Parameters:
  • default_target ([<class ‘str’>, typing.Any] (default: None)) – default output pattern for module, if None, will use config[‘output_map’] or wildcard pattern

  • warn (bool (default: False)) – if True, warn if no default target is specified

ModuleConfig.set_defaults(warn=False)#

Set default entries for the module in the config.

Parameters:

warn (bool)

ModuleConfig.set_defaults_per_dataset(dataset, warn=False)#

Set default entries for a specific dataset in the config.

Parameters:
ModuleConfig.update_inputs(dataset, input_files)#

Update input files for a specific dataset.

Parameters:
  • dataset (str)

  • input_files ([<class 'str'>, <class 'dict'>])

ModuleConfig.update_parameters(wildcards_df=None, parameters_df=None, wildcard_names=None, **paramspace_kwargs)#

Update parameters and all dependent attributes

Parameters:
  • wildcards_df (DataFrame (default: None)) – dataframe with updated wildcards

  • parameters_df (DataFrame (default: None)) – dataframe with updated parameters

  • wildcard_names (list (default: None)) – list of wildcard names to subset the paramspace by

  • paramspace_kwargs – additional arguments for snakemake.utils.Paramspace

ModuleConfig.write_output_files()#

Write output files mapping to a TSV file in the output directory.