datasetgen.generator
¶
Module Contents¶
-
datasetgen.generator.
_make_empty_df
() → ’pd.DataFrame’[source]¶ Generates an empy Dataframe with che columns indicated in COLUMNS dict.
- Returns
a new DataFrame
- Return type
pd.DataFrame
-
class
datasetgen.generator.
Day
(date: datetime.date, df: pd.DataFrame = None)[source]¶ Bases:
object
Initialize current day basic information.
- Parameters
date (datetime.date) – The current date of the Day object
-
bulk_append
(self, rows: List[dict])[source]¶ Insert a bunch of rows into the day’s dataframe.
- For each row it sets these default value:
reqDay = int(time.mktime(self._date.timetuple()))
JobSuccess = True
SiteName = 0
DataType = 0
FileType = 0
Also, if there is no information about the job this function generates a random fake information on cpu work using gen_fake_cpu_work:
NumCPU
WrapWC
WrapCPU
CPUTime
IOTime
- Parameters
rows (List[dict]) – List of rows
- Returns
self
- Return type
-
append
(self, row: dict)[source]¶ Insert a single row into the day’s dataframe.
- It sets these default value:
reqDay = int(time.mktime(self._date.timetuple()))
JobSuccess = True
SiteName = 0
DataType = 0
FileType = 0
If there is no information about the job this function generates a random fake information on cpu work using gen_fake_cpu_work:
NumCPU
WrapWC
WrapCPU
CPUTime
IOTime
- Parameters
row (dict) – the current row’s columns
- Returns
self
- Return type
-
class
datasetgen.generator.
Generator
(config: dict = {}, num_days: int = - 1, num_req_x_day: int = - 1, start_date: datetime.date = datetime.date(2020, 1, 1), seed: int = _DEFAULT_SEED, dest_folder: PurePath = Path('.'))[source]¶ Bases:
object
The main generatore object that creates datasets.
Initialize the generator.
- Parameters
config (dict, optional) – A dictionary with the configuration to use, defaults to {}
num_days (int, optional) – number of days to generate, defaults to -1
num_req_x_day (int, optional) – number of requests per day, defaults to -1
start_date (datetime, optional) – the starting date of the generator data, defaults to datetime.date(2020, 1, 1)
seed (int, optional) – the random generator seed, defaults to _DEFAULT_SEED
dest_folder (PurePath, optional) – the folder where to store the dataset, defaults to Path(“.”)
-
__update_seeds
(self)[source]¶ Initialize the random generator seeds.
Internal Python random generator seed and NumPy random seed.
-
property
df
(self)[source]¶ Returns a new dataframes that contains all the days’ dataframes.
- Returns
the concatenated dataframe
- Return type
pd.DataFrame
-
property
df_stats
(self)[source]¶ Returns the concat days’ dataframes and some useful stats
- Returns
a tuple with several DataFrames
- Return type
Tuple[pd.DataFrame]
-
property
days
(self)[source]¶ Returns a list of days’ DataFrames.
- Returns
a list with days’ DataFrames
- Return type
List[pd.DataFrame]
-
prepare
(self, function_name: str, kwargs: dict, max_buf_len: int = 1024)[source]¶ Prepare the dataset.
This method recall the function generators.
- Parameters
function_name (str) – The function to use during the preparation
kwargs (dict) – arguments of generator function
max_buf_len (int, optional) – size of row buffer, defaults to 1024
- Yield
status percentage of the preparation
- Return type
int
-
_open_dataset_file
(self, filename: str)[source]¶ Open a single dataset day.
- Returns
the current day data
- Return type