:mod:`datasetgen.generator`
===========================

.. py:module:: datasetgen.generator


Module Contents
---------------


.. data:: _DEFAULT_SEED
   :annotation: = 42

   
.. function:: _make_empty_df() -> 'pd.DataFrame'

   Generates an empy Dataframe with che columns indicated in COLUMNS dict.

   :return: a new DataFrame
   :rtype: pd.DataFrame


.. py:class:: Day(date: datetime.date, df: pd.DataFrame = None)

   Bases: :class:`object`

   Initialize current day basic information.

   :param date: The current date of the Day object
   :type date: datetime.date

   .. method:: __repr__(self)


   .. method:: df(self)
      :property:


   .. method:: reset_index(self)


      Reset the dataframe index inplace.

      :return: self
      :rtype: Day


   .. method:: bulk_append(self, rows: List[dict])


      Insert a bunch of rows into the day's dataframe.

      For each row it sets these default value:
          - reqDay = int(time.mktime(self._date.timetuple()))
          - JobSuccess = True
          - SiteName = 0
          - DataType = 0
          - FileType = 0

      Also, if there is no information about the job this function generates
      a random fake information on cpu work using `gen_fake_cpu_work`:
          - NumCPU
          - WrapWC
          - WrapCPU
          - CPUTime
          - IOTime

      :param rows: List of rows
      :type rows: List[dict]
      :return: self
      :rtype: Day


   .. method:: append(self, row: dict)


      Insert a single row into the day's dataframe.

      It sets these default value:
          - reqDay = int(time.mktime(self._date.timetuple()))
          - JobSuccess = True
          - SiteName = 0
          - DataType = 0
          - FileType = 0

      If there is no information about the job this function generates
      a random fake information on cpu work using `gen_fake_cpu_work`:
          - NumCPU
          - WrapWC
          - WrapCPU
          - CPUTime
          - IOTime

      :param row: the current row's columns
      :type row: dict
      :return: self
      :rtype: Day


   .. method:: save(self, dest_folder: PurePath = Path('.'))


      Export the current day dataframe in a zipped csv format.

      :param dest_folder: the destination directory, defaults to Path(".")
      :type dest_folder: PurePath, optional
      :return: self
      :rtype: Day


.. py:class:: Generator(config: dict = {}, num_days: int = -1, num_req_x_day: int = -1, start_date: datetime.date = datetime.date(2020, 1, 1), seed: int = _DEFAULT_SEED, dest_folder: PurePath = Path('.'))

   Bases: :class:`object`

   The main generatore object that creates datasets.

   Initialize the generator.

   :param config: A dictionary with the configuration to use, defaults to {}
   :type config: dict, optional
   :param num_days: number of days to generate, defaults to -1
   :type num_days: int, optional
   :param num_req_x_day: number of requests per day, defaults to -1
   :type num_req_x_day: int, optional
   :param start_date: the starting date of the generator data, defaults to datetime.date(2020, 1, 1)
   :type start_date: datetime, optional
   :param seed: the random generator seed, defaults to _DEFAULT_SEED
   :type seed: int, optional
   :param dest_folder: the folder where to store the dataset, defaults to Path(".")
   :type dest_folder: PurePath, optional

   .. method:: seed(self)
      :property:


   .. method:: __update_seeds(self)


      Initialize the random generator seeds.

      Internal Python random generator seed and NumPy random seed.


   .. method:: df(self)
      :property:


      Returns a new dataframes that contains all the days' dataframes.

      :return: the concatenated dataframe
      :rtype: pd.DataFrame


   .. method:: df_stats(self)
      :property:


      Returns the concat days' dataframes and some useful stats

      :return: a tuple with several DataFrames
      :rtype: Tuple[pd.DataFrame]


   .. method:: days(self)
      :property:


      Returns a list of days' DataFrames.

      :return: a list with days' DataFrames
      :rtype: List[pd.DataFrame]


   .. method:: num_req_x_day(self)
      :property:


   .. method:: num_days(self)
      :property:


   .. method:: tot_num_requests(self)
      :property:


   .. method:: dest_folder(self)
      :property:


   .. method:: clean(self)


      Delete all day dataframes.
              

   .. method:: prepare(self, function_name: str, kwargs: dict, max_buf_len: int = 1024)


      Prepare the dataset.

      This method recall the function generators.

      :param function_name: The function to use during the preparation
      :type function_name: str
      :param kwargs: arguments of generator function
      :type kwargs: dict
      :param max_buf_len: size of row buffer, defaults to 1024
      :type max_buf_len: int, optional
      :yield: status percentage of the preparation
      :rtype: int


   .. method:: _open_dataset_file(self, filename: str)


      Open a single dataset day.

      :return: the current day data
      :rtype: Day


   .. method:: open_data(self, folder: str)


      Open dataset from a folder.

      :param folder: the dataset folder
      :type folder: str


   .. method:: save(self)


      Exports all days' DataFrames in dest_folder.