sda.api.parse_utils#

Cleaned Excel file loading utilities - Pure Polars approach.

This module provides simplified Excel table reading using only Polars native table support. No pandas fallbacks, no XML parsing, no sheet reading - just fast, simple table reading.

Exceptions#

TableNotFoundError

Custom exception raised when no Excel tables are found in a file.

Classes#

ExcelFileHandler

Hybrid Excel table reader with XML discovery + Polars reading.

Functions#

`suppress_polars_output`()	Context manager to suppress Polars stdout/stderr output.
`get_excel_core_properties`(path)	Extract core properties from Excel file using direct XML parsing.
`normalize_files_input`(files)	Normalize the files input parameter to a consistent dict format.
`validate_file_extension`(file_name)	Validate and return the file extension.
`postprocess_dataframe`(df[, drop_unnamed_columns, dropna])	Apply common post-processing to the concatenated DataFrame.
`get_url_in_path`(path)	Scan a directory for .url files and extract the URL from the first one found.
`get_plume_url`(test_name)	Get the Plume URL for a given test name.
`sharepoint_synced_to_onedrive_shared_syntax`(path)	Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.
`onedrive_shared_to_sharepoint_synced_syntax`(path)	Convert a local path in a OneDrive-shared site to a SharePoint-synced path.
`is_wsl_environment`()	Detect if we're running in a WSL (Windows Subsystem for Linux) environment.
`convert_windows_path_to_wsl`(path)	Convert a Windows-style path to WSL mount path format.
`convert_wsl_path_to_windows_if_needed`(path)	Convert WSL paths back to Windows format if needed.

Module Contents#

sda.api.parse_utils.suppress_polars_output()#: Context manager to suppress Polars stdout/stderr output.

exception sda.api.parse_utils.TableNotFoundError(message, test_name=None, file_name=None)#

Bases: Exception

Custom exception raised when no Excel tables are found in a file.

This exception provides specific information about table detection failures and can be caught separately from other errors for custom handling.

test_name = None#

file_name = None#

class sda.api.parse_utils.ExcelFileHandler(file_name, read_params=None, verbose=1, suppress_polars_warnings=False)#

Hybrid Excel table reader with XML discovery + Polars reading.

Best of both worlds approach: - XML-based automatic table name discovery (no hardcoded fallbacks) - Polars native reading for 21x faster performance - Automatic file lock handling - Returns pandas DataFrames for compatibility

Requirements: - Polars and fastexcel packages - Excel files with proper Table objects (Insert > Table)

VALID_DATA_TABLE_NAMES = ['TestData', 'Base_de_données']#

Valid Name for Data Tables.

Users must name their primary data table one of these names. To edit a Data Table name in Excel see user-attachments/assets

file_name#

read_params#

verbose = 1#

suppress_polars_warnings = False#

temp_path = None#

is_using_temp = False#

detect_excel_tables()#

Detect Excel Table objects using fast XML parsing.

Returns:: list
Return type:: List of detected table names

read_with_table_detection(table_name=None)#

Read Excel table using XML discovery + Polars reading.

Hybrid approach combining the best of both worlds: 1. XML-based table name discovery (automatic, no hardcoded fallbacks) 2. Polars native reading for maximum performance

Requires Excel Table objects (Insert > Table in Excel).

Args:: table_name: Specific table name to use, or None for auto-discovery

Returns:

pd.DataFrame

Return type:

Loaded data converted from Polars

Raises:

ImportError – If Polars is not installed
ValueError – If no Excel Table objects are found in the file

cleanup()#

Clean up temporary file if it exists.

Note: Temp files are left for potential reuse by other handler instances. They will be cleaned up by the OS temp directory cleanup or when the process exits.

sda.api.parse_utils.get_excel_core_properties(path)#

Extract core properties from Excel file using direct XML parsing.

This is much faster than loading the entire workbook with openpyxl. Performance improvement: ~57x faster than openpyxl.load_workbook().

Parameters:: path (str | Path) – Path to the Excel file.
Returns:: Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy.
Return type:: dict[str, str | None]

Examples

>>> from sda.api.parse_utils import get_excel_core_properties
>>> props = get_excel_core_properties("test_file.xlsx")
>>> print(f"Keywords: {props['keywords']}")
Keywords: STT0.1

Was used previously when STT (Spark Test Template) was used to identify test files.

See also

get_test_template_number(): Extract metadata from Excel file properties.
select_parser(): Choose parser based on file metadata

sda.api.parse_utils.normalize_files_input(files)#

Normalize the files input parameter to a consistent dict format.

Parameters:: files (str, Path, list, or dict) – Files input in various formats
Returns:: Normalized dict with {filename: command_dict} format
Return type:: dict
Raises:: ValueError – If files input is invalid or empty

sda.api.parse_utils.validate_file_extension(file_name)#

Validate and return the file extension.

Parameters:: file_name (str) – File name to validate
Returns:: File extension (e.g., ‘.xlsx’, ‘.csv’)
Return type:: str
Raises:: ValueError –

sda.api.parse_utils.postprocess_dataframe(df, drop_unnamed_columns=True, dropna=True)#

Apply common post-processing to the concatenated DataFrame.

Parameters:

df (pd.DataFrame) – Input DataFrame
drop_unnamed_columns (bool) – Whether to drop unnamed columns
dropna (bool) – Whether to drop rows with all NaN values

Returns:

Post-processed DataFrame

Return type:

pd.DataFrame

sda.api.parse_utils.get_url_in_path(path)#

Scan a directory for .url files and extract the URL from the first one found.

Parameters:: path (Path) – The path to the directory to scan.
Returns:: The URL found in the .url file, or None if no .url file is found or if the file is malformed.
Return type:: str | None

sda.api.parse_utils.get_plume_url(test_name)#

Get the Plume URL for a given test name.

This function resolves the test name to its local path and then extracts the URL from a .url file within that path.

Parameters:: test_name (str) – The name of the test (e.g., “T234”).
Returns:: The Plume URL if found, otherwise None.
Return type:: str | None

sda.api.parse_utils.sharepoint_synced_to_onedrive_shared_syntax(path)#

Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.

Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:

"~/Spark Cleantech/6. DATA 2024 S2 - Documents"
"~/Spark Cleantech/6. DATA 2025 S1 - Documents"
"~/Spark Cleantech/7. 2024 DATA CH5 - Documents"

OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:

"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
"~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"

Parameters:: path (Path or str) – Input path that may be in SharePoint-synced format
Returns:: Converted path in OneDrive-shared format, or original path if no conversion needed
Return type:: Path

Examples

>>> from pathlib import Path
>>> path = Path("~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder")
>>> converted = sharepoint_synced_to_onedrive_shared_syntax(path)
>>> str(converted)
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"

sda.api.parse_utils.onedrive_shared_to_sharepoint_synced_syntax(path)#

Convert a local path in a OneDrive-shared site to a SharePoint-synced path.

OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:

"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
"~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"

Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:

"~/Spark Cleantech/6. DATA 2024 S2 - Documents"
"~/Spark Cleantech/6. DATA 2025 S1 - Documents"
"~/Spark Cleantech/7. 2024 DATA CH5 - Documents"

Parameters:: path (Path or str) – Input path that may be in OneDrive-shared format
Returns:: Converted path in SharePoint-synced format, or original path if no conversion needed
Return type:: Path

Examples

>>> from pathlib import Path
>>> path = Path(
...     "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"
... )
>>> converted = onedrive_shared_to_sharepoint_synced_syntax(path)
>>> str(converted)
"~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder"

sda.api.parse_utils.is_wsl_environment()#

Detect if we’re running in a WSL (Windows Subsystem for Linux) environment.

Returns:: True if running in WSL, False otherwise
Return type:: bool

sda.api.parse_utils.convert_windows_path_to_wsl(path)#

Convert a Windows-style path to WSL mount path format.

This function converts paths from Windows format (e.g., ~/Documents) to WSL mount format (e.g., /mnt/c/Users/username/Documents).

Parameters:: path (Path) – Windows-style path to convert
Returns:: WSL mount path, or original path if conversion not needed/possible
Return type:: Path

Examples

>>> from pathlib import Path
>>> path = Path("~/Spark Cleantech/6. DATA 2025 S1 - Documents")
>>> converted = convert_windows_path_to_wsl(path)
>>> str(converted)
"/mnt/c/Users/username/Spark Cleantech/6. DATA 2025 S1 - Documents"

sda.api.parse_utils.convert_wsl_path_to_windows_if_needed(path)#

Convert WSL paths back to Windows format if needed.

This is a utility function that can be used to convert paths back from WSL format to Windows format when interfacing with Windows applications.

Parameters:: path (Path) – Path to potentially convert
Returns:: Converted path or original path if no conversion needed
Return type:: Path

sda.api.parse_utils#

Exceptions#

Classes#

Functions#

Module Contents#

This Page