datasetgen.utils

Module Contents

datasetgen.utils._FILE_SIZE_STEP = 100[source]
datasetgen.utils._SIZE_PROB_DISTRIBUTION[source]
datasetgen.utils.str2bool(v: str) → bool[source]

Function to convert a string to bool and check if it is true.

Parameters

v (str) – input string

Raises

ArgumentTypeError – if the string is not a boolean value

Returns

the string boolean value

Return type

bool

datasetgen.utils.gen_random_sizes(num_files: int, min_file_size: int, max_file_size: int) → list[source]

Generates a list of sizes for each files using a random distribution.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minimum file size

  • max_file_size (int) – maximum file size

Returns

list of file sizes

Return type

list

datasetgen.utils.gen_in_range_random_sizes(num_files: int, min_file_size: int, max_file_size: int) → list[source]

Generates a list of sizes that follows the use case distribution.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minimum file size

  • max_file_size (int) – masimum file size

Returns

list of file sizes

Return type

list

datasetgen.utils.gen_random_files(num_files: int, min_file_size: int, max_file_size: int, size_generator_function: str = 'gen_in_range_random_sizes', start_from: int = 0) → dict[source]

Generates a dict with random files with a random size.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minimum file size

  • max_file_size (int) – maximum file size

  • size_generator_function (str, optional) – function to use to generate file sizes, defaults to ‘gen_in_range_random_sizes’

  • start_from (int, optional) – filename reference index, defaults to 0

Raises

Exception – size generator function not exists

Returns

dictionary with filenames and their sizes

Return type

dict

datasetgen.utils.gen_fake_cpu_work(num_cpus: int = 1) → tuple[source]

Generates a fake CPU times.

Parameters

num_cpus (int, optional) – number of CPU to simulate, defaults to 1

Returns

work statistics -> number of CPUs, wall time, CPU time, single CPU time and io time

Return type

tuple

datasetgen.utils.COLUMNS[source]