sda.api.parse_utils#

Cleaned Excel file loading utilities - Pure Polars approach.

This module provides simplified Excel table reading using only Polars native table support. No pandas fallbacks, no XML parsing, no sheet reading - just fast, simple table reading.

Exceptions#

TableNotFoundError

Custom exception raised when no Excel tables are found in a file.

Classes#

ExcelFileHandler

Hybrid Excel table reader with XML discovery + Polars reading.

Functions#

suppress_polars_output()

Context manager to suppress Polars stdout/stderr output.

get_excel_core_properties(path)

Extract core properties from Excel file using direct XML parsing.

normalize_files_input(files)

Normalize the files input parameter to a consistent dict format.

validate_file_extension(file_name)

Validate and return the file extension.

postprocess_dataframe(df[, drop_unnamed_columns, dropna])

Apply common post-processing to the concatenated DataFrame.

get_url_in_path(path)

Scan a directory for .url files and extract the URL from the first one found.

get_plume_url(test_name)

Get the Plume URL for a given test name.

sharepoint_synced_to_onedrive_shared_syntax(path)

Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.

onedrive_shared_to_sharepoint_synced_syntax(path)

Convert a local path in a OneDrive-shared site to a SharePoint-synced path.

is_wsl_environment()

Detect if we're running in a WSL (Windows Subsystem for Linux) environment.

convert_windows_path_to_wsl(path)

Convert a Windows-style path to WSL mount path format.

convert_wsl_path_to_windows_if_needed(path)

Convert WSL paths back to Windows format if needed.

Module Contents#

sda.api.parse_utils.suppress_polars_output()#

Context manager to suppress Polars stdout/stderr output.

exception sda.api.parse_utils.TableNotFoundError(message, test_name=None, file_name=None)#

Bases: Exception

Custom exception raised when no Excel tables are found in a file.

This exception provides specific information about table detection failures and can be caught separately from other errors for custom handling.

test_name = None#
file_name = None#
class sda.api.parse_utils.ExcelFileHandler(file_name, read_params=None, verbose=1, suppress_polars_warnings=False)#

Hybrid Excel table reader with XML discovery + Polars reading.

Best of both worlds approach: - XML-based automatic table name discovery (no hardcoded fallbacks) - Polars native reading for 21x faster performance - Automatic file lock handling - Returns pandas DataFrames for compatibility

Requirements: - Polars and fastexcel packages - Excel files with proper Table objects (Insert > Table)

VALID_DATA_TABLE_NAMES = ['TestData', 'Base_de_données']#

Valid Name for Data Tables.

Users must name their primary data table one of these names. To edit a Data Table name in Excel see user-attachments/assets

file_name#
read_params#
verbose = 1#
suppress_polars_warnings = False#
temp_path = None#
is_using_temp = False#
detect_excel_tables()#

Detect Excel Table objects using fast XML parsing.

Returns:

list

Return type:

List of detected table names

read_with_table_detection(table_name=None)#

Read Excel table using XML discovery + Polars reading.

Hybrid approach combining the best of both worlds: 1. XML-based table name discovery (automatic, no hardcoded fallbacks) 2. Polars native reading for maximum performance

Requires Excel Table objects (Insert > Table in Excel).

Args:

table_name: Specific table name to use, or None for auto-discovery

Returns:

pd.DataFrame

Return type:

Loaded data converted from Polars

Raises:
  • ImportError – If Polars is not installed

  • ValueError – If no Excel Table objects are found in the file

cleanup()#

Clean up temporary file if it exists.

Note: Temp files are left for potential reuse by other handler instances. They will be cleaned up by the OS temp directory cleanup or when the process exits.

sda.api.parse_utils.get_excel_core_properties(path)#

Extract core properties from Excel file using direct XML parsing.

This is much faster than loading the entire workbook with openpyxl. Performance improvement: ~57x faster than openpyxl.load_workbook().

Parameters:

path (str | Path) – Path to the Excel file.

Returns:

Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy.

Return type:

dict[str, str | None]

Examples

>>> from sda.api.parse_utils import get_excel_core_properties
>>> props = get_excel_core_properties("test_file.xlsx")
>>> print(f"Keywords: {props['keywords']}")
Keywords: STT0.1

Was used previously when STT (Spark Test Template) was used to identify test files.

See also

get_test_template_number()

Extract metadata from Excel file properties.

select_parser()

Choose parser based on file metadata

sda.api.parse_utils.normalize_files_input(files)#

Normalize the files input parameter to a consistent dict format.

Parameters:

files (str, Path, list, or dict) – Files input in various formats

Returns:

Normalized dict with {filename: command_dict} format

Return type:

dict

Raises:

ValueError – If files input is invalid or empty

sda.api.parse_utils.validate_file_extension(file_name)#

Validate and return the file extension.

Parameters:

file_name (str) – File name to validate

Returns:

File extension (e.g., ‘.xlsx’, ‘.csv’)

Return type:

str

Raises:

ValueError

sda.api.parse_utils.postprocess_dataframe(df, drop_unnamed_columns=True, dropna=True)#

Apply common post-processing to the concatenated DataFrame.

Parameters:
  • df (pd.DataFrame) – Input DataFrame

  • drop_unnamed_columns (bool) – Whether to drop unnamed columns

  • dropna (bool) – Whether to drop rows with all NaN values

Returns:

Post-processed DataFrame

Return type:

pd.DataFrame

sda.api.parse_utils.get_url_in_path(path)#

Scan a directory for .url files and extract the URL from the first one found.

Parameters:

path (Path) – The path to the directory to scan.

Returns:

The URL found in the .url file, or None if no .url file is found or if the file is malformed.

Return type:

str | None

sda.api.parse_utils.get_plume_url(test_name)#

Get the Plume URL for a given test name.

This function resolves the test name to its local path and then extracts the URL from a .url file within that path.

Parameters:

test_name (str) – The name of the test (e.g., “T234”).

Returns:

The Plume URL if found, otherwise None.

Return type:

str | None

sda.api.parse_utils.sharepoint_synced_to_onedrive_shared_syntax(path)#

Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.

Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:

"~/Spark Cleantech/6. DATA 2024 S2 - Documents"
"~/Spark Cleantech/6. DATA 2025 S1 - Documents"
"~/Spark Cleantech/7. 2024 DATA CH5 - Documents"

OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:

"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
"~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"
Parameters:

path (Path or str) – Input path that may be in SharePoint-synced format

Returns:

Converted path in OneDrive-shared format, or original path if no conversion needed

Return type:

Path

Examples

>>> from pathlib import Path
>>> path = Path("~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder")
>>> converted = sharepoint_synced_to_onedrive_shared_syntax(path)
>>> str(converted)
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"
sda.api.parse_utils.onedrive_shared_to_sharepoint_synced_syntax(path)#

Convert a local path in a OneDrive-shared site to a SharePoint-synced path.

OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:

"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2"
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1"
"~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"

Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:

"~/Spark Cleantech/6. DATA 2024 S2 - Documents"
"~/Spark Cleantech/6. DATA 2025 S1 - Documents"
"~/Spark Cleantech/7. 2024 DATA CH5 - Documents"
Parameters:

path (Path or str) – Input path that may be in OneDrive-shared format

Returns:

Converted path in SharePoint-synced format, or original path if no conversion needed

Return type:

Path

Examples

>>> from pathlib import Path
>>> path = Path(
...     "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"
... )
>>> converted = onedrive_shared_to_sharepoint_synced_syntax(path)
>>> str(converted)
"~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder"
sda.api.parse_utils.is_wsl_environment()#

Detect if we’re running in a WSL (Windows Subsystem for Linux) environment.

Returns:

True if running in WSL, False otherwise

Return type:

bool

sda.api.parse_utils.convert_windows_path_to_wsl(path)#

Convert a Windows-style path to WSL mount path format.

This function converts paths from Windows format (e.g., ~/Documents) to WSL mount format (e.g., /mnt/c/Users/username/Documents).

Parameters:

path (Path) – Windows-style path to convert

Returns:

WSL mount path, or original path if conversion not needed/possible

Return type:

Path

Examples

>>> from pathlib import Path
>>> path = Path("~/Spark Cleantech/6. DATA 2025 S1 - Documents")
>>> converted = convert_windows_path_to_wsl(path)
>>> str(converted)
"/mnt/c/Users/username/Spark Cleantech/6. DATA 2025 S1 - Documents"
sda.api.parse_utils.convert_wsl_path_to_windows_if_needed(path)#

Convert WSL paths back to Windows format if needed.

This is a utility function that can be used to convert paths back from WSL format to Windows format when interfacing with Windows applications.

Parameters:

path (Path) – Path to potentially convert

Returns:

Converted path or original path if no conversion needed

Return type:

Path