sda.api.parse_utils
===================

.. py:module:: sda.api.parse_utils

.. autoapi-nested-parse::

   Cleaned Excel file loading utilities - Pure Polars approach.

   This module provides simplified Excel table reading using only Polars native table support.
   No pandas fallbacks, no XML parsing, no sheet reading - just fast, simple table reading.


Exceptions
----------

.. autoapisummary::

   sda.api.parse_utils.TableNotFoundError


Classes
-------

.. autoapisummary::

   sda.api.parse_utils.ExcelFileHandler


Functions
---------

.. autoapisummary::

   sda.api.parse_utils.suppress_polars_output
   sda.api.parse_utils.get_excel_core_properties
   sda.api.parse_utils.normalize_files_input
   sda.api.parse_utils.validate_file_extension
   sda.api.parse_utils.postprocess_dataframe
   sda.api.parse_utils.get_url_in_path
   sda.api.parse_utils.get_plume_url
   sda.api.parse_utils.sharepoint_synced_to_onedrive_shared_syntax
   sda.api.parse_utils.onedrive_shared_to_sharepoint_synced_syntax
   sda.api.parse_utils.is_wsl_environment
   sda.api.parse_utils.convert_windows_path_to_wsl
   sda.api.parse_utils.convert_wsl_path_to_windows_if_needed


Module Contents
---------------

.. py:function:: suppress_polars_output()

   Context manager to suppress Polars stdout/stderr output.


.. py:exception:: TableNotFoundError(message, test_name=None, file_name=None)

   Bases: :py:obj:`Exception`


   Custom exception raised when no Excel tables are found in a file.

   This exception provides specific information about table detection failures
   and can be caught separately from other errors for custom handling.


   .. py:attribute:: test_name
      :value: None


   .. py:attribute:: file_name
      :value: None


.. py:class:: ExcelFileHandler(file_name, read_params=None, verbose=1, suppress_polars_warnings=False)

   Hybrid Excel table reader with XML discovery + Polars reading.

   Best of both worlds approach:
   - XML-based automatic table name discovery (no hardcoded fallbacks)
   - Polars native reading for 21x faster performance
   - Automatic file lock handling
   - Returns pandas DataFrames for compatibility

   Requirements:
   - Polars and fastexcel packages
   - Excel files with proper Table objects (Insert > Table)


   .. py:attribute:: VALID_DATA_TABLE_NAMES
      :value: ['TestData', 'Base_de_données']


      Valid Name for Data Tables.

      Users must name their primary data table one of these names. To edit a Data Table
      name in Excel see https://github.com/user-attachments/assets/80d59a30-5a57-4a31-9fef-0cc45eaf5141


   .. py:attribute:: file_name


   .. py:attribute:: read_params


   .. py:attribute:: verbose
      :value: 1


   .. py:attribute:: suppress_polars_warnings
      :value: False


   .. py:attribute:: temp_path
      :value: None


   .. py:attribute:: is_using_temp
      :value: False


   .. py:method:: detect_excel_tables()

      Detect Excel Table objects using fast XML parsing.

      :returns: **list**
      :rtype: :py:class:`List` of :py:class:`detected table names`


   .. py:method:: read_with_table_detection(table_name=None)

      Read Excel table using XML discovery + Polars reading.

      Hybrid approach combining the best of both worlds:
      1. XML-based table name discovery (automatic, no hardcoded fallbacks)
      2. Polars native reading for maximum performance

      Requires Excel Table objects (Insert > Table in Excel).

      Args:
          table_name: Specific table name to use, or None for auto-discovery

      :returns: **pd.DataFrame**
      :rtype: :py:class:`Loaded data converted from Polars`

      :raises ImportError: If Polars is not installed
      :raises ValueError: If no Excel Table objects are found in the file


   .. py:method:: cleanup()

      Clean up temporary file if it exists.

      Note: Temp files are left for potential reuse by other handler instances.
      They will be cleaned up by the OS temp directory cleanup or when the process exits.


.. py:function:: get_excel_core_properties(path)

   Extract core properties from Excel file using direct XML parsing.

   This is much faster than loading the entire workbook with openpyxl.
   Performance improvement: ~57x faster than openpyxl.load_workbook().

   :param path: Path to the Excel file.
   :type path: :py:class:`str | Path`

   :returns: Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy.
   :rtype: :py:class:`dict[str`, :py:class:`str | None]`

   .. rubric:: Examples

   >>> from sda.api.parse_utils import get_excel_core_properties
   >>> props = get_excel_core_properties("test_file.xlsx")
   >>> print(f"Keywords: {props['keywords']}")
   Keywords: STT0.1

   Was used previously when STT (Spark Test Template) was used to identify test files.

   .. seealso::

      :py:func:`~sda.api.load.get_test_template_number`
          Extract metadata from Excel file properties.
      
      :py:func:`~sda.api.load.select_parser`
          Choose parser based on file metadata


.. py:function:: normalize_files_input(files)

   Normalize the files input parameter to a consistent dict format.

   :param files: Files input in various formats
   :type files: :py:class:`str`, :py:class:`Path`, :py:class:`list`, or :py:class:`dict`

   :returns: Normalized dict with {filename: command_dict} format
   :rtype: :py:class:`dict`

   :raises ValueError: If files input is invalid or empty


.. py:function:: validate_file_extension(file_name)

   Validate and return the file extension.

   :param file_name: File name to validate
   :type file_name: :py:class:`str`

   :returns: File extension (e.g., '.xlsx', '.csv')
   :rtype: :py:class:`str`

   :raises ValueError:


.. py:function:: postprocess_dataframe(df, drop_unnamed_columns = True, dropna = True)

   Apply common post-processing to the concatenated DataFrame.

   :param df: Input DataFrame
   :type df: :py:class:`pd.DataFrame`
   :param drop_unnamed_columns: Whether to drop unnamed columns
   :type drop_unnamed_columns: :py:class:`bool`
   :param dropna: Whether to drop rows with all NaN values
   :type dropna: :py:class:`bool`

   :returns: Post-processed DataFrame
   :rtype: :py:class:`pd.DataFrame`


.. py:function:: get_url_in_path(path)

   Scan a directory for .url files and extract the URL from the first one found.

   :param path: The path to the directory to scan.
   :type path: :py:class:`Path`

   :returns: The URL found in the .url file, or None if no .url file is found or if
             the file is malformed.
   :rtype: :py:class:`str | None`


.. py:function:: get_plume_url(test_name)

   Get the Plume URL for a given test name.

   This function resolves the test name to its local path and then extracts
   the URL from a .url file within that path.

   :param test_name: The name of the test (e.g., "T234").
   :type test_name: :py:class:`str`

   :returns: The Plume URL if found, otherwise None.
   :rtype: :py:class:`str | None`


.. py:function:: sharepoint_synced_to_onedrive_shared_syntax(path)

   Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.

   Sharepoint-synced local path have been obtained by using the "Sync" button from
   the online Sharepoint interface. Typical names::

       "~/Spark Cleantech/6. DATA 2024 S2 - Documents"
       "~/Spark Cleantech/6. DATA 2025 S1 - Documents"
       "~/Spark Cleantech/7. 2024 DATA CH5 - Documents"

   OneDrive-shared paths local path have been obtained by using the "Add a shortcut to my Drive"
   button from the online Sharepoint interface. Typical names::

       "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
       "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
       "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"

   :param path: Input path that may be in SharePoint-synced format
   :type path: :py:class:`Path` or :py:class:`str`

   :returns: Converted path in OneDrive-shared format, or original path if no conversion needed
   :rtype: :py:class:`Path`

   .. rubric:: Examples

   >>> from pathlib import Path
   >>> path = Path("~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder")
   >>> converted = sharepoint_synced_to_onedrive_shared_syntax(path)
   >>> str(converted)
   "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"


.. py:function:: onedrive_shared_to_sharepoint_synced_syntax(path)

   Convert a local path in a OneDrive-shared site to a SharePoint-synced path.

   OneDrive-shared paths local path have been obtained by using the "Add a shortcut to my Drive"
   button from the online Sharepoint interface. Typical names::

       "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
       "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
       "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"

   Sharepoint-synced local path have been obtained by using the "Sync" button from
   the online Sharepoint interface. Typical names::

       "~/Spark Cleantech/6. DATA 2024 S2 - Documents"
       "~/Spark Cleantech/6. DATA 2025 S1 - Documents"
       "~/Spark Cleantech/7. 2024 DATA CH5 - Documents"

   :param path: Input path that may be in OneDrive-shared format
   :type path: :py:class:`Path` or :py:class:`str`

   :returns: Converted path in SharePoint-synced format, or original path if no conversion needed
   :rtype: :py:class:`Path`

   .. rubric:: Examples

   >>> from pathlib import Path
   >>> path = Path(
   ...     "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"
   ... )
   >>> converted = onedrive_shared_to_sharepoint_synced_syntax(path)
   >>> str(converted)
   "~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder"


.. py:function:: is_wsl_environment()

   Detect if we're running in a WSL (Windows Subsystem for Linux) environment.

   :returns: True if running in WSL, False otherwise
   :rtype: :py:class:`bool`


.. py:function:: convert_windows_path_to_wsl(path)

   Convert a Windows-style path to WSL mount path format.

   This function converts paths from Windows format (e.g., ~/Documents) to
   WSL mount format (e.g., /mnt/c/Users/username/Documents).

   :param path: Windows-style path to convert
   :type path: :py:class:`Path`

   :returns: WSL mount path, or original path if conversion not needed/possible
   :rtype: :py:class:`Path`

   .. rubric:: Examples

   >>> from pathlib import Path
   >>> path = Path("~/Spark Cleantech/6. DATA 2025 S1 - Documents")
   >>> converted = convert_windows_path_to_wsl(path)
   >>> str(converted)
   "/mnt/c/Users/username/Spark Cleantech/6. DATA 2025 S1 - Documents"


.. py:function:: convert_wsl_path_to_windows_if_needed(path)

   Convert WSL paths back to Windows format if needed.

   This is a utility function that can be used to convert paths back
   from WSL format to Windows format when interfacing with Windows applications.

   :param path: Path to potentially convert
   :type path: :py:class:`Path`

   :returns: Converted path or original path if no conversion needed
   :rtype: :py:class:`Path`