datasetgen.functions

Module Contents

class datasetgen.functions.GenFunction[source]

Bases: object

property day_idx(self)[source]
property num_req_x_day(self)[source]
abstract gen_day_elements(self, max_num: int = - 1)[source]

Generates all the day’s entries.

Parameters

max_num (int, optional) – maximum number of requests, defaults to -1

Yield

the percentage of work done

Return type

Generator[int, None, None]

property name(self)[source]
class datasetgen.functions.RandomGenerator(num_files: int, min_file_size: int, max_file_size: int, size_generator_function: str)[source]

Bases: datasetgen.functions.GenFunction

Initialize the random function parameters.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minumum size of the files

  • max_file_size (int) – maximum size of the files

  • size_generator_function (str) – name of the size generator function

__repr__(self)[source]
gen_day_elements(self, max_num: int = - 1)[source]

Generates all the day’s entries.

Parameters

max_num (int, optional) – maximum number of requests, defaults to -1

Yield

the percentage of work done

Return type

Generator[int, None, None]

class datasetgen.functions.HighFrequencyDataset(num_files: int, min_file_size: int, max_file_size: int, lambda_less_req_files: float, lambda_more_req_files: float, perc_more_req_files: float, perc_files_x_day: float, size_generator_function: str)[source]

Bases: datasetgen.functions.GenFunction

Dataset to test the frequency aspect.

Initialize the frequency function parameters.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minumum size of the files

  • max_file_size (int) – maximum size of the files

  • lambda_less_req_files (float) – Poisson distribution lambda for less requested files

  • lambda_more_req_files (float) – Poisson distribution lambda for more requested files

  • perc_more_req_files (float) – percentage of more requested files

  • perc_files_x_day (float) – percentage of files per day (selected files)

  • size_generator_function (str) – name of the size generator function

__repr__(self)[source]
gen_day_elements(self, max_num: int = - 1)[source]

Generates all the day’s entries.

Parameters

max_num (int, optional) – maximum number of requests, defaults to -1

Yield

the percentage of work done

Return type

Generator[int, None, None]

class datasetgen.functions.RecencyFocusedDataset(num_files: int, min_file_size: int, max_file_size: int, perc_files_x_day: float, size_generator_function: str)[source]

Bases: datasetgen.functions.GenFunction

Dataset to test the recency aspect.

Initialize the recency function parameters.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minumum size of the files

  • max_file_size (int) – maximum size of the files

  • perc_files_x_day (float) – percentage of files per day (selected files)

  • size_generator_function (str) – name of the size generator function

__repr__(self)[source]
gen_day_elements(self, max_num: int = - 1)[source]

Generates all the day’s entries.

Parameters

max_num (int, optional) – maximum number of requests, defaults to -1

Yield

the percentage of work done

Return type

Generator[int, None, None]

class datasetgen.functions.SizeFocusedDataset(num_files: int, min_file_size: int, max_file_size: int, noise_min_file_size: int, noise_max_file_size: int, perc_noise: float, perc_files_x_day: float, size_generator_function: str)[source]

Bases: datasetgen.functions.GenFunction

Dataset to test the different distribution of file sizes.

Initialize the size function parameters.

Parameters
  • num_files (int) – total number of files

  • min_file_size (int) – minumum size of the files

  • max_file_size (int) – maximum size of the files

  • noise_min_file_size (int) – minimum size of the noise files

  • noise_max_file_size (int) – maximum size of the noise files

  • perc_noise (float) – percentage of noise files

  • perc_files_x_day (float) – percentage of files per day (selected files)

  • size_generator_function (str) – name of the size generator function

__repr__(self)[source]
gen_day_elements(self, max_num: int = - 1)[source]

Generates all the day’s entries.

Parameters

max_num (int, optional) – maximum number of requests, defaults to -1

Yield

the percentage of work done

Return type

Generator[int, None, None]