sda.api
=======

.. py:module:: sda.api

.. autoapi-nested-parse::

   SDA API module for loading and processing experimental data.

   This module provides functions for loading test data from various sources
   including SharePoint sites, local files, and databases.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /_api/sda/api/database_report/index
   /_api/sda/api/file_discovery/index
   /_api/sda/api/load/index
   /_api/sda/api/load_data_ovh/index
   /_api/sda/api/parse_test_file/index
   /_api/sda/api/parse_utils/index


Exceptions
----------

.. autoapisummary::

   sda.api.TableNotFoundError


Functions
---------

.. autoapisummary::

   sda.api.analyze_database_files
   sda.api.create_summary_dataframe
   sda.api.generate_database_report
   sda.api.print_quality_thermometer
   sda.api.discover_data_file
   sda.api.get_base_folders
   sda.api.get_data_folder
   sda.api.list_all_files
   sda.api.list_all_files_in_test
   sda.api.list_all_tests
   sda.api.resolve_test_path
   sda.api.load_test
   sda.api.load_tests
   sda.api.parse_files
   sda.api.load_data
   sda.api.parse_test_files
   sda.api.get_excel_core_properties


Package Contents
----------------

.. py:function:: analyze_database_files(file_filter = '*.xls*', max_files = None, verbose = True, include_details = False)

   Analyze database files across multiple data sources.

   This function discovers data files, parses them individually while measuring
   success rates, and returns a comprehensive analysis of table detection, worksheets, and data structure.

   :param file_filter: Filter pattern for files to analyze (e.g., "*.xls*", "*.csv", "*").
   :type file_filter: :py:class:`str`, *default* ``"*.xls*"``
   :param max_files: Maximum number of files to process (useful for quick testing).
   :type max_files: :py:class:`int`, *optional*
   :param verbose: Whether to print progress information.
   :type verbose: :py:class:`bool`, *default* :py:obj:`True`
   :param include_details: Whether to include detailed column information in the results.
   :type include_details: :py:class:`bool`, *default* :py:obj:`False`

   :returns: DataFrame with columns:
             - test_name: Name of the test
             - file_name: Name of the parsed file
             - file_path: Full path to the file
             - file_size_mb: Size of the file in megabytes
             - num_points: Number of data points (rows) in the parsed DataFrame
             - num_columns: Number of columns in the parsed DataFrame
             - parse_time_seconds: Time taken to parse the file
             - points_per_second: Parsing rate (points per second)
             - mb_per_second: File processing rate (MB per second)
             - status: Success/Error status
             - error_message: Error details if parsing failed
             - plume_url: URL to the test path
             - full_path: Full path to the file if include_details is True
   :rtype: :py:class:`pd.DataFrame`

   .. rubric:: Examples

   >>> from sda.api.performance import analyze_database_files

   # Analyze all Excel files
   >>> results = analyze_database_files(file_filter="*.xls*")

   # Quick test with first 5 files
   >>> results = analyze_database_files(max_files=5, verbose=True)

   # Analyze specific file types
   >>> csv_results = analyze_database_files(file_filter="*.csv")

   .. seealso::

      :py:obj:`generate_database_report`
          Generate detailed reports from analysis results
      
      :py:obj:`print_quality_thermometer`
          Display visual quality assessment


.. py:function:: create_summary_dataframe(results_df, all_tests, all_files)

   Create a comprehensive summary DataFrame from analysis results.

   This function generates a structured summary of the analysis including:
   - Overall discovery metrics (tests, files)
   - File analysis metrics (success/failure rates)
   - Data volume metrics
   - Performance metrics

   :param results_df: Results from analyze_database_files()
   :type results_df: :py:class:`pd.DataFrame`
   :param all_tests: List of all discovered tests
   :type all_tests: :py:class:`list`
   :param all_files: List of all discovered files
   :type all_files: :py:class:`list`

   :returns: Summary DataFrame with metrics organized by category
   :rtype: :py:class:`pd.DataFrame`

   .. rubric:: Examples

   >>> from sda.api.performance import (
   ...     analyze_database_files,
   ...     create_summary_dataframe,
   ... )
   >>> from sda.api import list_all_tests, list_all_files
   >>>
   >>> results = analyze_database_files(max_files=10)
   >>> all_tests = list_all_tests()
   >>> all_files = list_all_files()
   >>> summary_df = create_summary_dataframe(results, all_tests, all_files)


.. py:function:: generate_database_report(results_df, save_to_file = None)

   Generate a detailed database report from parsing results.

   :param results_df: Results from analyze_database_files()
   :type results_df: :py:class:`pd.DataFrame`
   :param save_to_file: Path to save the report (Excel format)
   :type save_to_file: :py:class:`str`, *optional*

   :returns: Dictionary containing summary statistics
   :rtype: :py:class:`dict`

   .. rubric:: Examples

   >>> from sda.api.performance import (
   ...     analyze_database_files,
   ...     generate_database_report,
   ... )
   >>> results = analyze_database_files(max_files=10)
   >>> summary = generate_database_report(results, save_to_file="report.xlsx")

   .. seealso::

      :py:obj:`analyze_database_files`
          Run performance analysis on data files
      
      :py:obj:`print_quality_thermometer`
          Display visual quality metrics


.. py:function:: print_quality_thermometer(results_df)

   Print a visual quality thermometer showing data health metrics.

   :param results_df: Results DataFrame from analyze_database_files()
   :type results_df: :py:class:`pd.DataFrame`

   .. rubric:: Examples

   >>> from sda.api.performance import (
   ...     analyze_database_files,
   ...     print_quality_thermometer,
   ... )
   >>> results = analyze_database_files(max_files=10)
   >>> print_quality_thermometer(results)


.. py:function:: discover_data_file(folder_path, test_name, datafilename_filter = '*.xls*')

   Find the actual data file to load in the specified folder.

   :param folder_path: Path to the folder to search in.
   :type folder_path: :py:class:`Path`
   :param test_name: Name of the test to search for.
   :type test_name: :py:class:`str | Path`
   :param datafilename_filter: Filter to apply to the data files, by default "*.xls*" (matches .xlsx, .xlsm, .xls).
   :type datafilename_filter: :py:class:`str`, *optional*

   :returns: Path to the data file.
   :rtype: :py:class:`Path`

   :raises FileNotFoundError: If no file or multiple files are found matching the criteria.

   .. rubric:: Examples

   >>> from pathlib import Path
   >>> from sda.api.file_discovery import discover_data_file
   >>> folder = Path("~/data/T183")
   >>> file_path = discover_data_file(folder, "T183", "*.xls*")

   .. seealso::

      :py:func:`~sda.api.file_discovery.resolve_test_path`
          Resolve test name to folder path


.. py:function:: get_base_folders()

   Get the base folder paths for the data and Spark sites.

   :returns: Dictionary containing the paths to the Spark data sites, with expanded
             user key (~/)
   :rtype: :py:class:`dict[str`, :py:class:`Path]`

   :raises ValueError: If the configuration file does not contain the required keys.
   :raises FileNotFoundError: If the data folder or Spark sites path does not exist.

   .. rubric:: Notes

   This function reads the configuration file to get the paths for the old data site
   and Spark sites. It checks for the required keys in the configuration and raises
   appropriate errors if they are missing. The paths are expanded to their full
   absolute paths.

   In CI environments or when SDA_USE_TEST_DATA environment variable is set,
   test data fixtures are automatically included.


.. py:function:: get_data_folder(test_name)

   Get the data folder based on the test name.

   :param test_name: Name of the test file to load, ex. ``T135``.
   :type test_name: :py:class:`str`

   :returns: Path to the data folder.
   :rtype: :py:class:`Path`

   .. rubric:: Notes

   The corresponding data sharepoint site is automatically determined by the test name:

   - if TXXX with 239<=XXX; returns "6. DATA 2024 S2/Gif/[test_name]"
   - if TXXX with 192<=XXX<239; returns "6. DATA 2025 S1/Gif/[test_name]"
   - if TXXX with 100<=XXX<192; returns "SPARK/6. Données/2024/[test_name]"
     (except T102, T104 in "2023/231218" and T105 in "2023/231221")
   - if 22XX, returns "SPARK/6. Données/2022/[test_name]"
   - if 21XX, returns "SPARK/6. Données/2021/[test_name]"
   - if 20XX, returns "SPARK/6. Données/2020/[test_name]"
   - some test names also correspond to CH-5 campaigns, they are on the
     "7. DATA 2024-2025CH5" Sharepoint. See the list of tests here :
         :py:data:`~sda.api.file_discovery.CH5_CAMPAIGN_TESTS`

   Test data fixtures (T097, T083) are supported for CI testing.


.. py:function:: list_all_files(filter='*.xls*', max_recursion_level=None, verbose=0)

   List all files in the data folders, using filter.

   Data folder path is read from the configuration file `~/sda.json`.

   :param filter: Filter to apply to the file names, by default "*.xls*" (matches .xlsx, .xlsm, .xls).
                  Use ``*`` for all files.
   :type filter: :py:class:`str`, *optional*
   :param max_recursion_level: Maximum recursion depth for directory scanning. If None (default), uses unlimited
                               recursion (original behavior). If 2, provides optimal 38x speedup with perfect
                               test discovery accuracy. If 3, provides 7x speedup. Higher values reduce performance
                               gains while maintaining compatibility with deeper file structures.
   :type max_recursion_level: :py:class:`int`, *optional*
   :param verbose: verbosity level
   :type verbose: :py:class:`int`

   :returns: List of Path objects pointing to discovered files.
   :rtype: :py:class:`list`

   .. rubric:: Examples

   Use no filter:

   >>> from sda.api.file_discovery import list_all_files
   >>> datafiles = list_all_files("*")

   Get only Excel files:

   >>> excel_files = list_all_files("*.xls*")
   >>> print(f"Found {len(excel_files)} Excel files")

   Use limited recursion for massive performance improvement:

   >>> # 38x faster for test discovery with perfect accuracy
   >>> fast_files = list_all_files("*.xls*", max_recursion_level=2)

   >>> # 7x faster with deeper file compatibility
   >>> files = list_all_files("*.xls*", max_recursion_level=3)

   Print files in max_recursion_level=3 but not in max_recursion_level=2::

       file_paths_2 = list_all_files(filter="*.xls*", max_recursion_level=2)
       file_paths_3 = list_all_files(filter="*.xls*", max_recursion_level=3)
       print("Files in max_recursion_level=3 but not in max_recursion_level=2:")
       for path in file_paths_3:
           if path not in file_paths_2:
               print(path)

   .. seealso::

      :py:func:`~sda.api.file_discovery.list_all_tests`
          List available test names from files
      
      :py:func:`~sda.api.file_discovery.list_all_files_in_test`
          List all files for a specific test


.. py:function:: list_all_files_in_test(test_name, filter='*', max_recursion_level=2, verbose=0)

   List all files available for a specific test (T*** tests only).

   This function discovers all files within a test's data folder, providing
   users with a way to explore what data files are available for analysis.

   **Note**: This function only works with 2023+ test names that start with "T"
   (e.g., "T183", "T196"). For 2021-2022 tests (e.g., "21s16"), use
   :py:func:`~sda.api.file_discovery.list_all_files` with appropriate filters instead.

   :param test_name: Name of the test to search for files, must start with "T" (e.g., "T183", "T196").
   :type test_name: :py:class:`str`
   :param filter: Filter pattern for file names, by default "*" (all files).
                  Examples: "*.xlsx" for Excel files, "*.csv" for CSV files, "*.txt" for text files.
   :type filter: :py:class:`str`, *optional*
   :param max_recursion_level: Maximum recursion depth for directory scanning, by default 2.
                               Provides optimal performance while maintaining compatibility.
   :type max_recursion_level: :py:class:`int`, *optional*
   :param verbose: Verbosity level, by default 0.
   :type verbose: :py:class:`int`, *optional*

   :returns: List of Path objects pointing to all files found in the test folder.
   :rtype: :py:class:`list`

   :raises ValueError: If the test name doesn't start with "T" (not a 2023+ test format).
   :raises FileNotFoundError: If the test folder cannot be found or accessed.

   .. rubric:: Examples

   List all files for a T*** test:

   >>> from sda.api.file_discovery import list_all_files_in_test
   >>> files = list_all_files_in_test("T183")
   >>> print(f"Found {len(files)} files for T183")

   List only Excel files:

   >>> excel_files = list_all_files_in_test("T183", filter="*.xls*")
   >>> print(f"Excel files: {[f.name for f in excel_files]}")

   List with different file types:

   >>> # Get all CSV files
   >>> csv_files = list_all_files_in_test("T183", filter="*.csv")
   >>>
   >>> # Get all image files
   >>> images = list_all_files_in_test("T183", filter="*.png")
   >>>
   >>> # Get all files (default)
   >>> all_files = list_all_files_in_test("T183")

   .. seealso::

      :py:func:`~sda.api.file_discovery.list_all_tests`
          List all available test names
      
      :py:func:`~sda.api.file_discovery.list_all_files`
          List all files across all tests
      
      :py:func:`~sda.api.file_discovery.get_data_folder`
          Get the data folder for a test name


.. py:function:: list_all_tests(include_2021_2022=True, include_2023_current=True, filter='*', max_recursion_level=2, verbose=0)

   List all available test names from the data folders.

   This function discovers test files and extracts their test names,
   providing a dynamic way to find available tests without hardcoding.

   :param include_2021_2022: Whether to include 2021-2022 test data.
   :type include_2021_2022: :py:class:`bool`, *default* :py:obj:`True`
   :param include_2023_current: Whether to include 2023-current test data.
   :type include_2023_current: :py:class:`bool`, *default* :py:obj:`True`
   :param filter: Filter pattern for test names (supports wildcards).
                  Examples: "T1*" for tests starting with T1, "*2021*" for tests containing 2021.
   :type filter: :py:class:`str`, *default* ``"*"``
   :param max_recursion_level: Maximum recursion depth for directory scanning. If None, uses unlimited
                               recursion. If 2, provides optimal ~38x speedup with perfect accuracy (if data files
                               architecture is respected). Default None.
   :type max_recursion_level: :py:class:`int`, *optional*
   :param verbose: Verbosity level.
   :type verbose: :py:class:`int`, *default* ``0``

   :returns: List of available test names (e.g., ["21s16", "T183", "T196"]).
   :rtype: :py:class:`list`

   .. rubric:: Examples

   >>> from sda.api.file_discovery import list_all_tests
   >>> tests = list_all_tests()
   >>> print(f"Found {len(tests)} tests: {tests[:5]}...")  # Show first 5

   >>> # Get only 2023+ tests
   >>> recent_tests = list_all_tests(include_2021_2022=False)

   >>> # Get tests starting with T1
   >>> t1_tests = list_all_tests(filter="T1*")

   >>> # Get tests containing specific patterns
   >>> filtered_tests = list_all_tests(filter="*183*")

   .. seealso::

      :py:func:`~sda.api.file_discovery.list_all_files`
          List all data files in folders
      
      :py:func:`~sda.api.file_discovery.list_all_files_in_test`
          List all files for a specific test


.. py:function:: resolve_test_path(test_name, data_sharepoint = 'auto')

   Resolve test name to folder path.

   :param test_name: Name of the test file to load, ex. ``T135``, or direct path to file.
   :type test_name: :py:class:`str | Path`
   :param data_sharepoint: Name of data sharepoint where the test is located, by default "auto".
   :type data_sharepoint: :py:class:`str`, *optional*

   :returns: Path to the folder containing the test data.
   :rtype: :py:class:`Path`

   :raises ValueError: If data_sharepoint is specified with a direct path, or if sharepoint doesn't exist.

   .. rubric:: Examples

   >>> from sda.api.file_discovery import resolve_test_path
   >>> folder_path = resolve_test_path("T183")

   .. seealso::

      :py:func:`~sda.api.file_discovery.get_data_folder`
          Get data folder for test name
      
      :py:func:`~sda.api.file_discovery.discover_data_file`
          Find data file in resolved folder


.. py:function:: load_test(test_name, datafilename_filter = '*.xls*', data_sharepoint = 'auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, **kwargs)

   Load data from a specific test by name or path.

   This is a high-level function that handles the complete workflow:
   path resolution, file discovery, and data parsing using the unified
   parser approach.

   :param test_name: Name of the test file to load, ex. ``T135``, or direct path to file.
   :type test_name: :py:class:`str | Path`
   :param datafilename_filter: Filter to apply to the data files, by default "*.xls*" (matches .xlsx, .xlsm, .xls).
   :type datafilename_filter: :py:class:`str`, *optional*
   :param data_sharepoint: Name of data sharepoint where the test is located, by default "auto".
   :type data_sharepoint: :py:class:`str`, *optional*
   :param command: Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.
   :type command: :py:class:`dict`, *optional*
   :param columns_to_keep: List of columns to keep (default: None).
   :type columns_to_keep: :py:class:`list`, *optional*
   :param verbose: Verbosity level.
   :type verbose: :py:class:`int`, *default* ``1``
   :param column_not_found: What to do if a column is not found.
   :type column_not_found: :py:class:`str`, *default* ``'raise'``
   :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'.
   :type table_not_found: :py:class:`str`, *default* ``'raise'``
   :param suppress_polars_warnings: If True, suppress Polars dtype warning messages during reading.
   :type suppress_polars_warnings: :py:class:`bool`, *default* :py:obj:`False`
   :param \*\*kwargs: Additional arguments passed to the parser function.

   :returns: :class:`pandas.DataFrame` containing the test data with automatic table
             detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Load by test name:

   >>> from sda.api.load import load_test
   >>> df = load_test("T183")

   Load with specific filter:

   >>> df = load_test("T183", datafilename_filter="*_processed.xlsx")

   Load from direct path:

   >>> df = load_test("/path/to/data/T183_experiment.xlsx")

   .. seealso::

      :py:func:`~sda.api.file_discovery.resolve_test_path`
          Resolve a test name or path to the canonical folder path.
      
      :py:func:`~sda.api.file_discovery.discover_data_file`
          Find a data file in the resolved test folder using a filename pattern.
      
      :py:func:`~sda.api.parse_test_file.parse_test_files`
          Unified parser for all supported years (2021+).
      
      :py:func:`~sda.api.load.parse_files`
          Parse one or many files directly when you already know the paths.


.. py:function:: load_tests(test_names=None, include_2021_2022=True, include_2023_current=True)

   Load multiple tests by names or discover all available tests.

   This function provides a convenient way to load multiple test datasets
   at once, with automatic discovery if no specific test names are provided.

   :param test_names: List of test names to load. If None, discovers all available tests.
   :type test_names: :py:class:`list`, *optional*
   :param include_2021_2022: Whether to include 2021-2022 test data in discovery.
   :type include_2021_2022: :py:class:`bool`, *default* :py:obj:`True`
   :param include_2023_current: Whether to include 2023-current test data in discovery.
   :type include_2023_current: :py:class:`bool`, *default* :py:obj:`True`

   :returns: :class:`pandas.DataFrame` combined from all loaded tests with automatic
             table detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Load specific tests:

   >>> from sda.api.load import load_tests
   >>> df = load_tests(["T183", "T196"])

   Load all available tests:

   >>> df = load_tests()  # Discovers and loads all tests

   Load only recent tests:

   >>> df = load_tests(include_2021_2022=False)

   .. seealso::

      :py:func:`~sda.api.file_discovery.list_all_tests`
          Discover available test identifiers.
      
      :py:func:`~sda.api.load.load_test`
          Load a single test.
      
      :py:func:`~sda.api.load.parse_files`
          Parse multiple files directly.


.. py:function:: parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs)

   Parse multiple files using the unified parser approach.

   Since all files now use the same unified parser (parse_test_files with ExcelFileHandler),
   this function directly calls that parser without any grouping logic.

   :param files: Files to parse. Same format as parse_test_files (unified approach).
   :type files: :py:class:`str`, :py:class:`Path`, :py:class:`list`, or :py:class:`dict`
   :param command: Default command for reading files. If None, will use parser-specific defaults.
   :type command: :py:class:`dict`, *optional*
   :param columns_to_keep: List of columns to keep (only applies to 2023+ data).
   :type columns_to_keep: :py:class:`list`, *optional*
   :param verbose: Verbosity level.
   :type verbose: :py:class:`int`, *default* ``1``
   :param column_not_found: What to do if a column is not found.
   :type column_not_found: :py:class:`str`, *default* ``'warn'``
   :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'.
   :type table_not_found: :py:class:`str`, *default* ``'raise'``
   :param \*\*kwargs: Additional arguments passed to parser functions.

   :returns: :class:`pandas.DataFrame` combined from all files with automatic table
             detection.
   :rtype: :py:class:`pandas.DataFrame`

   .. rubric:: Examples

   Parse a single file:

   >>> from sda.api.load import parse_files
   >>> df = parse_files("T183.xlsx")

   Parse multiple files:

   >>> files = ["T183.xlsx", "T196.xlsx"]
   >>> df = parse_files(files)

   Parse with legacy command format:

   >>> files = {"data.xlsx": {}}  # Empty dict uses table detection
   >>> df = parse_files(files)

   .. seealso::

      :py:func:`~sda.api.parse_test_file.parse_test_files`
          Unified parser function used under the hood.
      
      :py:func:`~sda.api.file_discovery.list_all_files`
          Discover eligible files to parse inside a test folder.
      
      :py:func:`~sda.api.load.load_test`
          High-level convenience wrapper to load a single test by name.


.. py:function:: load_data(instrument_code, start_time = None, end_time = None)

   Load data from a specific table in the database based on the instrument code and time range.

   :param instrument_code: The code of the instrument used to determine the table and columns to query.
   :type instrument_code: :py:class:`str`
   :param start_time: The start timestamp for filtering the data. Defaults to None.
   :type start_time: :py:class:`str`, *optional*
   :param end_time: The end timestamp for filtering the data. Defaults to None.
   :type end_time: :py:class:`str`, *optional*

   :returns: A DataFrame containing the selected data from the specified time range and instrument.
   :rtype: :py:class:`pandas.DataFrame`

   :raises ValueError: - If the instrument code is not found in the table mapping.
       - If the database configuration is incomplete or contains invalid values.
   :raises ConnectionError: - If the connection to the database fails.
       - If the SQL query fails to execute due to connection issues.

   .. rubric:: Notes

   - The function reads database configurations from `~/sda.json`.
       It is essential that this file contains the following keys:
       - DB_USER: Database username.
       - DB_PASSWORD: Database password.
       - DB_HOST: Hostname or IP address of the database server.
       - DB_PORT: Port number the database server is listening on.
       - DB_NAME: Name of the database to connect to.

   - If neither `start_time` nor `end_time` is provided,
       the function returns all data from the specified table.

   - If only `start_time` is provided, the function returns data from `start_time` to the current time.

   - Ensure that the database server is accessible and the credentials provided are correct.

   - For more information on setting up the configuration file, visit:
       https://github.com/spark-cleantech/sda/blob/main/README.md#postgres-credentials--connection


.. py:function:: parse_test_files(files, drop_unnamed_columns=True, command=None, dropna=True, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False)

   Parse experiment files using unified Excel table detection.

   Uses ExcelFileHandler with XML Discovery + Polars reading for maximum performance and accuracy.
   Automatically detects Excel Table objects (Insert > Table in Excel) for robust data loading.

   :param files: Files to parse. Can be:

                 - str: filename
                 - Path: filename
                 - list: list of filenames
                 - dict: {filename : special_reading_commands} (legacy parameter, mostly ignored)

                     - filename: str
                     - special_reading_commands: dict (legacy, mostly ignored in favor of table detection)
   :type files: :py:class:`str | Path | list | dict[str`, :py:class:`dict] | dict[Path`, :py:class:`dict]`
   :param drop_unnamed_columns: If True, drop unnamed columns (columns with no name), default to True.
   :type drop_unnamed_columns: :py:class:`bool`, *optional*
   :param command: Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.
                   For CSV files, standard pandas parameters still apply.
   :type command: :py:class:`dict`, *optional*
   :param dropna: Drop rows with all NaN, default to True.
   :type dropna: :py:class:`bool`
   :param columns_to_keep: List of columns to keep for filtering (default: None)
   :type columns_to_keep: :py:class:`list`
   :param verbose: Verbosity level (default: 1).
                   If 0, no output.
                   If 1, print progress with basic information.
                   If 2, print detailed file processing information.
   :type verbose: :py:class:`int`
   :param column_not_found: What to do if a column is not found in the file. Can be 'raise', 'warn', or 'ignore'.

                            - 'raise': raise an error if a column is not found.
                            - 'warn': print a warning if a column is not found.
                            - 'ignore': do nothing if a column is not found.
   :type column_not_found: :py:class:`str`
   :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'.

                           - 'raise': raise a TableNotFoundError if no tables are found (default).
                           - 'warn': print a warning if no tables are found and skip the file.
   :type table_not_found: :py:class:`str`
   :param suppress_polars_warnings: If True, suppress Polars dtype warning messages during reading (default: False).
   :type suppress_polars_warnings: :py:class:`bool`

   :returns: Combined DataFrame from all processed files with automatic table detection
   :rtype: :py:class:`pd.DataFrame`

   .. rubric:: Examples

   >>> files = ["T137.xlsx", "T153.xlsx"]
   >>> df = parse_test_files(files)

   >>> files = {"test_data.xlsx": {}}  # Empty dict uses table detection
   >>> df = parse_test_files(files)

   .. rubric:: Notes

   Uses ExcelFileHandler with XML-based table discovery and Polars reading engine
   for optimal performance. Legacy pandas sheet-based reading is deprecated.


.. py:exception:: TableNotFoundError(message, test_name=None, file_name=None)

   Bases: :py:obj:`Exception`


   Custom exception raised when no Excel tables are found in a file.

   This exception provides specific information about table detection failures
   and can be caught separately from other errors for custom handling.


   .. py:attribute:: test_name
      :value: None


   .. py:attribute:: file_name
      :value: None


.. py:function:: get_excel_core_properties(path)

   Extract core properties from Excel file using direct XML parsing.

   This is much faster than loading the entire workbook with openpyxl.
   Performance improvement: ~57x faster than openpyxl.load_workbook().

   :param path: Path to the Excel file.
   :type path: :py:class:`str | Path`

   :returns: Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy.
   :rtype: :py:class:`dict[str`, :py:class:`str | None]`

   .. rubric:: Examples

   >>> from sda.api.parse_utils import get_excel_core_properties
   >>> props = get_excel_core_properties("test_file.xlsx")
   >>> print(f"Keywords: {props['keywords']}")
   Keywords: STT0.1

   Was used previously when STT (Spark Test Template) was used to identify test files.

   .. seealso::

      :py:func:`~sda.api.load.get_test_template_number`
          Extract metadata from Excel file properties.
      
      :py:func:`~sda.api.load.select_parser`
          Choose parser based on file metadata