sda.api.load
============

.. py:module:: sda.api.load


Functions
---------

.. autoapisummary::

   sda.api.load.select_parser
   sda.api.load.load_test
   sda.api.load.parse_files
   sda.api.load.load_tests


Module Contents
---------------

.. py:function:: select_parser(data_file, folder_path)

   Select the appropriate parser function for a given data file.

   Unified approach: Always use parse_test_files
   (supports ExcelFileHandler with XML Discovery + Polars for all files)

   .. seealso::

      :py:func:`~sda.api.parse_test_file.parse_test_files`
          Unified parser used across all years.


.. py:function:: load_test(test_name, datafilename_filter = '*.xls*', data_sharepoint = 'auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, **kwargs)

   Load data from a specific test by name or path.

   This is a high-level function that handles the complete workflow:
   path resolution, file discovery, and data parsing using the unified
   parser approach.

   :param test_name: Name of the test file to load, ex. ``T135``, or direct path to file.
   :type test_name: :py:class:`str | Path`
   :param datafilename_filter: Filter to apply to the data files, by default "*.xls*" (matches .xlsx, .xlsm, .xls).
   :type datafilename_filter: :py:class:`str`, *optional*
   :param data_sharepoint: Name of data sharepoint where the test is located, by default "auto".
   :type data_sharepoint: :py:class:`str`, *optional*
   :param command: Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.
   :type command: :py:class:`dict`, *optional*
   :param columns_to_keep: List of columns to keep (default: None).
   :type columns_to_keep: :py:class:`list`, *optional*
   :param verbose: Verbosity level.
   :type verbose: :py:class:`int`, *default* ``1``
   :param column_not_found: What to do if a column is not found.
   :type column_not_found: :py:class:`str`, *default* ``'raise'``
   :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'.
   :type table_not_found: :py:class:`str`, *default* ``'raise'``
   :param suppress_polars_warnings: If True, suppress Polars dtype warning messages during reading.
   :type suppress_polars_warnings: :py:class:`bool`, *default* :py:obj:`False`
   :param \*\*kwargs: Additional arguments passed to the parser function.

   :returns: :class:`pandas.DataFrame` containing the test data with automatic table
             detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Load by test name:

   >>> from sda.api.load import load_test
   >>> df = load_test("T183")

   Load with specific filter:

   >>> df = load_test("T183", datafilename_filter="*_processed.xlsx")

   Load from direct path:

   >>> df = load_test("/path/to/data/T183_experiment.xlsx")

   .. seealso::

      :py:func:`~sda.api.file_discovery.resolve_test_path`
          Resolve a test name or path to the canonical folder path.
      
      :py:func:`~sda.api.file_discovery.discover_data_file`
          Find a data file in the resolved test folder using a filename pattern.
      
      :py:func:`~sda.api.parse_test_file.parse_test_files`
          Unified parser for all supported years (2021+).
      
      :py:func:`~sda.api.load.parse_files`
          Parse one or many files directly when you already know the paths.


.. py:function:: parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs)

   Parse multiple files using the unified parser approach.

   Since all files now use the same unified parser (parse_test_files with ExcelFileHandler),
   this function directly calls that parser without any grouping logic.

   :param files: Files to parse. Same format as parse_test_files (unified approach).
   :type files: :py:class:`str`, :py:class:`Path`, :py:class:`list`, or :py:class:`dict`
   :param command: Default command for reading files. If None, will use parser-specific defaults.
   :type command: :py:class:`dict`, *optional*
   :param columns_to_keep: List of columns to keep (only applies to 2023+ data).
   :type columns_to_keep: :py:class:`list`, *optional*
   :param verbose: Verbosity level.
   :type verbose: :py:class:`int`, *default* ``1``
   :param column_not_found: What to do if a column is not found.
   :type column_not_found: :py:class:`str`, *default* ``'warn'``
   :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'.
   :type table_not_found: :py:class:`str`, *default* ``'raise'``
   :param \*\*kwargs: Additional arguments passed to parser functions.

   :returns: :class:`pandas.DataFrame` combined from all files with automatic table
             detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Parse a single file:

   >>> from sda.api.load import parse_files
   >>> df = parse_files("T183.xlsx")

   Parse multiple files:

   >>> files = ["T183.xlsx", "T196.xlsx"]
   >>> df = parse_files(files)

   Parse with legacy command format:

   >>> files = {"data.xlsx": {}}  # Empty dict uses table detection
   >>> df = parse_files(files)

   .. seealso::

      :py:func:`~sda.api.parse_test_file.parse_test_files`
          Unified parser function used under the hood.
      
      :py:func:`~sda.api.file_discovery.list_all_files`
          Discover eligible files to parse inside a test folder.
      
      :py:func:`~sda.api.load.load_test`
          High-level convenience wrapper to load a single test by name.


.. py:function:: load_tests(test_names=None, include_2021_2022=True, include_2023_current=True)

   Load multiple tests by names or discover all available tests.

   This function provides a convenient way to load multiple test datasets
   at once, with automatic discovery if no specific test names are provided.

   :param test_names: List of test names to load. If None, discovers all available tests.
   :type test_names: :py:class:`list`, *optional*
   :param include_2021_2022: Whether to include 2021-2022 test data in discovery.
   :type include_2021_2022: :py:class:`bool`, *default* :py:obj:`True`
   :param include_2023_current: Whether to include 2023-current test data in discovery.
   :type include_2023_current: :py:class:`bool`, *default* :py:obj:`True`

   :returns: :class:`pandas.DataFrame` combined from all loaded tests with automatic
             table detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Load specific tests:

   >>> from sda.api.load import load_tests
   >>> df = load_tests(["T183", "T196"])

   Load all available tests:

   >>> df = load_tests()  # Discovers and loads all tests

   Load only recent tests:

   >>> df = load_tests(include_2021_2022=False)

   .. seealso::

      :py:func:`~sda.api.file_discovery.list_all_tests`
          Discover available test identifiers.
      
      :py:func:`~sda.api.load.load_test`
          Load a single test.
      
      :py:func:`~sda.api.load.parse_files`
          Parse multiple files directly.