scatlastb_utils.pipeline.ModuleConfig.ModuleConfig

scatlastb_utils.pipeline.ModuleConfig.ModuleConfig#

class scatlastb_utils.pipeline.ModuleConfig.ModuleConfig(module_name, config, parameters=None, default_target=None, wildcard_names=None, mandatory_wildcards=None, config_params=None, rename_config_params=None, explode_by=None, paramspace_kwargs=None, dont_inherit=None, dtypes=None, write_output_files=True, warn=False)#

ModuleConfig class to handle module configuration and parameters.

This class is designed to encapsulate the configuration for a specific module in a Snakemake pipeline, including input files, parameters, and output targets. It provides methods to retrieve and manipulate these configurations, as well as to write output files based on the module’s parameters and wildcards.

The main use of this class is to initialise the input files, parameters, and any configurations by the user, and to provide a structured way to access and manipulate these configurations within the scAtlasTb Snakemake workflow.

Parameters:

module_name (str)
config (dict)
parameters ([pd.DataFrame, str])
default_target ([str, Any])
wildcard_names (list)
mandatory_wildcards (list)
config_params (list)
rename_config_params (dict)
explode_by ([str, list])
paramspace_kwargs (dict)
dont_inherit (list)
dtypes (dict)
write_output_files (bool)
warn (bool)

Methods table#

`copy`()	Create a copy of the ModuleConfig instance.
`get_config`()	Get complete config with parsed input files.
`get_datasets`()	Get config for datasets that use the module.
`get_defaults`([module_name])	Get defaults for module.
`get_for_dataset`(dataset, query[, default, warn])	Get any key from the config via query
`get_from_parameters`(query_dict, ...)	Retrieve a specific parameter from the parameters DataFrame.
`get_input_file`(dataset, file_id, **kwargs)	Retrieve a specific input file for a dataset.
`get_input_file_wildcards`()	Retrieve input file wildcards for the module.
`get_input_files`()	Retrieve all input files for the module.
`get_input_files_per_dataset`(dataset)	Retrieve input files for a specific dataset.
`get_output_files`([pattern, extra_wildcards, ...])	Get output file based on wildcards
`get_parameters`()	Retrieve the parameters DataFrame for the module.
`get_paramspace`(**kwargs)	Retrieve the parameter space for the module.
`get_profile`(wildcards)	Get the resource profile for the given wildcards.
`get_resource`(resource_key[, profile, ...])	Retrieve resource information from config['resources']
`get_wildcard_names`()	Retrieve wildcard names for the module.
`get_wildcards`(**kwargs)	Retrieve wildcard instances as dictionary
`set_datasets`([warn])	Set dataset configs
`set_default_target`([default_target, warn])	Set the default target for the module.
`set_defaults`([warn])	Set default entries for the module in the config.
`set_defaults_per_dataset`(dataset[, warn])	Set default entries for a specific dataset in the config.
`update_inputs`(dataset, input_files)	Update input files for a specific dataset.
`update_parameters`([wildcards_df, ...])	Update parameters and all dependent attributes
`write_output_files`()	Write output files mapping to a TSV file in the output directory.

Methods#

ModuleConfig.copy()#: Create a copy of the ModuleConfig instance.

ModuleConfig.get_config()#

Get complete config with parsed input files.

Return type:: dict

ModuleConfig.get_datasets()#

Get config for datasets that use the module.

Return type:: dict

ModuleConfig.get_defaults(module_name=None)#

Get defaults for module.

Parameters:: module_name (str)

ModuleConfig.get_for_dataset(dataset, query, default=None, warn=False)#

Get any key from the config via query

Args:: dataset (str): dataset key in config[‘DATASETS’] query (list): list of keys to walk down the config default (Union[str,bool,float,int,dict,list, None], optional): default value if key not found. Defaults to None.

Return type:

Returns:

Union[str,bool,float,int,dict,list, None]: value of query in config

Parameters:

dataset (str)
query (list)
default (str | bool | float | int | dict | list | None)
warn (bool)

ModuleConfig.get_from_parameters(query_dict, parameter_key, **kwargs)#

Retrieve a specific parameter from the parameters DataFrame.

Parameters:

query_dict ([<class 'dict'>, Any])
parameter_key (str)

ModuleConfig.get_input_file(dataset, file_id, **kwargs)#: Retrieve a specific input file for a dataset.

ModuleConfig.get_input_file_wildcards()#: Retrieve input file wildcards for the module.

ModuleConfig.get_input_files()#: Retrieve all input files for the module.

ModuleConfig.get_input_files_per_dataset(dataset)#: Retrieve input files for a specific dataset.

ModuleConfig.get_output_files(pattern=None, extra_wildcards=None, allow_missing=False, as_dict=False, as_records=False, return_wildcards=False, verbose=False, **kwargs)#

Get output file based on wildcards

Parameters:

pattern ([<class ‘str’>, typing.Any] (default: None)) – output pattern, defaults to self.default_target
kwargs – arguments passed to WildcardParameters.get_wildcards
extra_wildcards (dict)
allow_missing (bool)
as_dict (bool)
as_records (bool)
return_wildcards (bool)
verbose (bool)

Return type:

list

ModuleConfig.get_parameters()#: Retrieve the parameters DataFrame for the module.

ModuleConfig.get_paramspace(**kwargs)#: Retrieve the parameter space for the module.

ModuleConfig.get_profile(wildcards)#

Get the resource profile for the given wildcards.

Parameters:: wildcards ([<class 'dict'>, Any])

ModuleConfig.get_resource(resource_key, profile='cpu', attempt=1, attempt_to_cpu=1, factor=0.5, verbose=False)#

Retrieve resource information from config[‘resources’]

Parameters:

profile (str (default: 'cpu')) – resource profile, key under config[‘resources’]
resource_key (str) – resource key, key under config[‘resources’][profile]
attempt (int)
attempt_to_cpu (int)
factor (float)
verbose (bool)

Return type:

[<class ‘str’>, <class ‘int’>, <class ‘float’>]

ModuleConfig.get_wildcard_names()#: Retrieve wildcard names for the module.

ModuleConfig.get_wildcards(**kwargs)#

Retrieve wildcard instances as dictionary

Parameters:: kwargs – arguments passed to WildcardParameters.get_wildcards
Return type:: [<class ‘dict’>, <class ‘pandas.core.frame.DataFrame’>]
Returns:: dictionary of wildcards that can be applied directly for expanding target files

ModuleConfig.set_datasets(warn=False)#

Set dataset configs

Parameters:: warn (bool)

ModuleConfig.set_default_target(default_target=None, warn=False)#

Set the default target for the module.

Parameters:

default_target ([<class ‘str’>, typing.Any] (default: None)) – default output pattern for module, if None, will use config[‘output_map’] or wildcard pattern
warn (bool (default: False)) – if True, warn if no default target is specified

ModuleConfig.set_defaults(warn=False)#

Set default entries for the module in the config.

Parameters:: warn (bool)

ModuleConfig.set_defaults_per_dataset(dataset, warn=False)#

Set default entries for a specific dataset in the config.

Parameters:

dataset (str)
warn (bool)

ModuleConfig.update_inputs(dataset, input_files)#

Update input files for a specific dataset.

Parameters:

dataset (str)
input_files ([<class 'str'>, <class 'dict'>])

ModuleConfig.update_parameters(wildcards_df=None, parameters_df=None, wildcard_names=None, **paramspace_kwargs)#

Update parameters and all dependent attributes

Parameters:

wildcards_df (DataFrame (default: None)) – dataframe with updated wildcards
parameters_df (DataFrame (default: None)) – dataframe with updated parameters
wildcard_names (list (default: None)) – list of wildcard names to subset the paramspace by
paramspace_kwargs – additional arguments for snakemake.utils.Paramspace

ModuleConfig.write_output_files()#: Write output files mapping to a TSV file in the output directory.

scatlastb_utils.pipeline.ModuleConfig.ModuleConfig

Contents

scatlastb_utils.pipeline.ModuleConfig.ModuleConfig#

Methods table#

Methods#