scatlastb_utils.pipeline.ModuleConfig.InputFiles#

class scatlastb_utils.pipeline.ModuleConfig.InputFiles(module_name, dataset_config, output_directory=None)#

Class to handle input files for a specific module.

This class is designed to parse input files from a configuration dictionary, map them to unique identifiers, and provide easy access to these files across different datasets. It also supports writing the file mapping to a specified output directory.

Parameters:
  • module_name (str)

  • dataset_config (dict)

  • output_directory ([<class 'str'>, <class 'pathlib._local.Path'>])

Methods table#

get_file(dataset, file_id)

Get file path for a given dataset and file ID.

get_files([as_df])

Get the file name to file path mapping for all datasets.

get_files_per_dataset(dataset)

Get file name to file path mapping for a given dataset.

get_wildcards()

Get input filename wildcards for all datasets.

parse(input_files[, digest_size])

Parse input files.

set_file_per_dataset(dataset[, digest_size])

Set input files for a given dataset.

Methods#

InputFiles.get_file(dataset, file_id)#

Get file path for a given dataset and file ID.

Return type:

[<class ‘str’>, <class ‘pathlib._local.Path’>]

InputFiles.get_files(as_df=False)#

Get the file name to file path mapping for all datasets.

Return type:

[<class ‘dict’>, <class ‘pandas.core.frame.DataFrame’>]

InputFiles.get_files_per_dataset(dataset)#

Get file name to file path mapping for a given dataset.

Return type:

dict

InputFiles.get_wildcards()#

Get input filename wildcards for all datasets.

Return type:

dict

static InputFiles.parse(input_files, digest_size=5)#

Parse input files.

Given input files, convert into file name to file path mapping. If no file names are provided, create a unique hash code.

Return type:

dict

Parameters:
  • input_files ([<class 'str'>, <class 'list'>, <class 'dict'>])

  • digest_size (int)

InputFiles.set_file_per_dataset(dataset, digest_size=5)#

Set input files for a given dataset.

This function maps an input file with its unique identifier, if no file ID is specified

Parameters:
  • dataset (str) – dataset key in config[‘DATASETS’]

  • digest_size (int)