sda.api.parse_utils =================== .. py:module:: sda.api.parse_utils .. autoapi-nested-parse:: Cleaned Excel file loading utilities - Pure Polars approach. This module provides simplified Excel table reading using only Polars native table support. No pandas fallbacks, no XML parsing, no sheet reading - just fast, simple table reading. Exceptions ---------- .. autoapisummary:: sda.api.parse_utils.TableNotFoundError Classes ------- .. autoapisummary:: sda.api.parse_utils.ExcelFileHandler Functions --------- .. autoapisummary:: sda.api.parse_utils.suppress_polars_output sda.api.parse_utils.get_excel_core_properties sda.api.parse_utils.normalize_files_input sda.api.parse_utils.validate_file_extension sda.api.parse_utils.postprocess_dataframe sda.api.parse_utils.get_url_in_path sda.api.parse_utils.get_plume_url sda.api.parse_utils.sharepoint_synced_to_onedrive_shared_syntax sda.api.parse_utils.onedrive_shared_to_sharepoint_synced_syntax sda.api.parse_utils.is_wsl_environment sda.api.parse_utils.convert_windows_path_to_wsl sda.api.parse_utils.convert_wsl_path_to_windows_if_needed Module Contents --------------- .. py:function:: suppress_polars_output() Context manager to suppress Polars stdout/stderr output. .. py:exception:: TableNotFoundError(message, test_name=None, file_name=None) Bases: :py:obj:`Exception` Custom exception raised when no Excel tables are found in a file. This exception provides specific information about table detection failures and can be caught separately from other errors for custom handling. .. py:attribute:: test_name :value: None .. py:attribute:: file_name :value: None .. py:class:: ExcelFileHandler(file_name, read_params=None, verbose=1, suppress_polars_warnings=False) Hybrid Excel table reader with XML discovery + Polars reading. Best of both worlds approach: - XML-based automatic table name discovery (no hardcoded fallbacks) - Polars native reading for 21x faster performance - Automatic file lock handling - Returns pandas DataFrames for compatibility Requirements: - Polars and fastexcel packages - Excel files with proper Table objects (Insert > Table) .. py:attribute:: VALID_DATA_TABLE_NAMES :value: ['TestData', 'Base_de_données'] Valid Name for Data Tables. Users must name their primary data table one of these names. To edit a Data Table name in Excel see https://github.com/user-attachments/assets/80d59a30-5a57-4a31-9fef-0cc45eaf5141 .. py:attribute:: file_name .. py:attribute:: read_params .. py:attribute:: verbose :value: 1 .. py:attribute:: suppress_polars_warnings :value: False .. py:attribute:: temp_path :value: None .. py:attribute:: is_using_temp :value: False .. py:method:: detect_excel_tables() Detect Excel Table objects using fast XML parsing. :returns: **list** :rtype: :py:class:`List` of :py:class:`detected table names` .. py:method:: read_with_table_detection(table_name=None) Read Excel table using XML discovery + Polars reading. Hybrid approach combining the best of both worlds: 1. XML-based table name discovery (automatic, no hardcoded fallbacks) 2. Polars native reading for maximum performance Requires Excel Table objects (Insert > Table in Excel). Args: table_name: Specific table name to use, or None for auto-discovery :returns: **pd.DataFrame** :rtype: :py:class:`Loaded data converted from Polars` :raises ImportError: If Polars is not installed :raises ValueError: If no Excel Table objects are found in the file .. py:method:: cleanup() Clean up temporary file if it exists. Note: Temp files are left for potential reuse by other handler instances. They will be cleaned up by the OS temp directory cleanup or when the process exits. .. py:function:: get_excel_core_properties(path) Extract core properties from Excel file using direct XML parsing. This is much faster than loading the entire workbook with openpyxl. Performance improvement: ~57x faster than openpyxl.load_workbook(). :param path: Path to the Excel file. :type path: :py:class:`str | Path` :returns: Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy. :rtype: :py:class:`dict[str`, :py:class:`str | None]` .. rubric:: Examples >>> from sda.api.parse_utils import get_excel_core_properties >>> props = get_excel_core_properties("test_file.xlsx") >>> print(f"Keywords: {props['keywords']}") Keywords: STT0.1 Was used previously when STT (Spark Test Template) was used to identify test files. .. seealso:: :py:func:`~sda.api.load.get_test_template_number` Extract metadata from Excel file properties. :py:func:`~sda.api.load.select_parser` Choose parser based on file metadata .. py:function:: normalize_files_input(files) Normalize the files input parameter to a consistent dict format. :param files: Files input in various formats :type files: :py:class:`str`, :py:class:`Path`, :py:class:`list`, or :py:class:`dict` :returns: Normalized dict with {filename: command_dict} format :rtype: :py:class:`dict` :raises ValueError: If files input is invalid or empty .. py:function:: validate_file_extension(file_name) Validate and return the file extension. :param file_name: File name to validate :type file_name: :py:class:`str` :returns: File extension (e.g., '.xlsx', '.csv') :rtype: :py:class:`str` :raises ValueError: .. py:function:: postprocess_dataframe(df, drop_unnamed_columns = True, dropna = True) Apply common post-processing to the concatenated DataFrame. :param df: Input DataFrame :type df: :py:class:`pd.DataFrame` :param drop_unnamed_columns: Whether to drop unnamed columns :type drop_unnamed_columns: :py:class:`bool` :param dropna: Whether to drop rows with all NaN values :type dropna: :py:class:`bool` :returns: Post-processed DataFrame :rtype: :py:class:`pd.DataFrame` .. py:function:: get_url_in_path(path) Scan a directory for .url files and extract the URL from the first one found. :param path: The path to the directory to scan. :type path: :py:class:`Path` :returns: The URL found in the .url file, or None if no .url file is found or if the file is malformed. :rtype: :py:class:`str | None` .. py:function:: get_plume_url(test_name) Get the Plume URL for a given test name. This function resolves the test name to its local path and then extracts the URL from a .url file within that path. :param test_name: The name of the test (e.g., "T234"). :type test_name: :py:class:`str` :returns: The Plume URL if found, otherwise None. :rtype: :py:class:`str | None` .. py:function:: sharepoint_synced_to_onedrive_shared_syntax(path) Convert a local path in a Sharepoint-synced site to a Onedrive-shared path. Sharepoint-synced local path have been obtained by using the "Sync" button from the online Sharepoint interface. Typical names:: "~/Spark Cleantech/6. DATA 2024 S2 - Documents" "~/Spark Cleantech/6. DATA 2025 S1 - Documents" "~/Spark Cleantech/7. 2024 DATA CH5 - Documents" OneDrive-shared paths local path have been obtained by using the "Add a shortcut to my Drive" button from the online Sharepoint interface. Typical names:: "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2" "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1" "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5" :param path: Input path that may be in SharePoint-synced format :type path: :py:class:`Path` or :py:class:`str` :returns: Converted path in OneDrive-shared format, or original path if no conversion needed :rtype: :py:class:`Path` .. rubric:: Examples >>> from pathlib import Path >>> path = Path("~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder") >>> converted = sharepoint_synced_to_onedrive_shared_syntax(path) >>> str(converted) "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder" .. py:function:: onedrive_shared_to_sharepoint_synced_syntax(path) Convert a local path in a OneDrive-shared site to a SharePoint-synced path. OneDrive-shared paths local path have been obtained by using the "Add a shortcut to my Drive" button from the online Sharepoint interface. Typical names:: "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2" "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1" "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5" Sharepoint-synced local path have been obtained by using the "Sync" button from the online Sharepoint interface. Typical names:: "~/Spark Cleantech/6. DATA 2024 S2 - Documents" "~/Spark Cleantech/6. DATA 2025 S1 - Documents" "~/Spark Cleantech/7. 2024 DATA CH5 - Documents" :param path: Input path that may be in OneDrive-shared format :type path: :py:class:`Path` or :py:class:`str` :returns: Converted path in SharePoint-synced format, or original path if no conversion needed :rtype: :py:class:`Path` .. rubric:: Examples >>> from pathlib import Path >>> path = Path( ... "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder" ... ) >>> converted = onedrive_shared_to_sharepoint_synced_syntax(path) >>> str(converted) "~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder" .. py:function:: is_wsl_environment() Detect if we're running in a WSL (Windows Subsystem for Linux) environment. :returns: True if running in WSL, False otherwise :rtype: :py:class:`bool` .. py:function:: convert_windows_path_to_wsl(path) Convert a Windows-style path to WSL mount path format. This function converts paths from Windows format (e.g., ~/Documents) to WSL mount format (e.g., /mnt/c/Users/username/Documents). :param path: Windows-style path to convert :type path: :py:class:`Path` :returns: WSL mount path, or original path if conversion not needed/possible :rtype: :py:class:`Path` .. rubric:: Examples >>> from pathlib import Path >>> path = Path("~/Spark Cleantech/6. DATA 2025 S1 - Documents") >>> converted = convert_windows_path_to_wsl(path) >>> str(converted) "/mnt/c/Users/username/Spark Cleantech/6. DATA 2025 S1 - Documents" .. py:function:: convert_wsl_path_to_windows_if_needed(path) Convert WSL paths back to Windows format if needed. This is a utility function that can be used to convert paths back from WSL format to Windows format when interfacing with Windows applications. :param path: Path to potentially convert :type path: :py:class:`Path` :returns: Converted path or original path if no conversion needed :rtype: :py:class:`Path`