sda.api.parse_utils#
Cleaned Excel file loading utilities - Pure Polars approach.
This module provides simplified Excel table reading using only Polars native table support. No pandas fallbacks, no XML parsing, no sheet reading - just fast, simple table reading.
Exceptions#
Custom exception raised when no Excel tables are found in a file. |
Classes#
Hybrid Excel table reader with XML discovery + Polars reading. |
Functions#
Context manager to suppress Polars stdout/stderr output. |
|
Extract core properties from Excel file using direct XML parsing. |
|
|
Normalize the files input parameter to a consistent dict format. |
|
Validate and return the file extension. |
|
Apply common post-processing to the concatenated DataFrame. |
|
Scan a directory for .url files and extract the URL from the first one found. |
|
Get the Plume URL for a given test name. |
Convert a local path in a Sharepoint-synced site to a Onedrive-shared path. |
|
Convert a local path in a OneDrive-shared site to a SharePoint-synced path. |
|
Detect if we're running in a WSL (Windows Subsystem for Linux) environment. |
|
Convert a Windows-style path to WSL mount path format. |
|
Convert WSL paths back to Windows format if needed. |
Module Contents#
- sda.api.parse_utils.suppress_polars_output()#
Context manager to suppress Polars stdout/stderr output.
- exception sda.api.parse_utils.TableNotFoundError(message, test_name=None, file_name=None)#
Bases:
ExceptionCustom exception raised when no Excel tables are found in a file.
This exception provides specific information about table detection failures and can be caught separately from other errors for custom handling.
- test_name = None#
- file_name = None#
- class sda.api.parse_utils.ExcelFileHandler(file_name, read_params=None, verbose=1, suppress_polars_warnings=False)#
Hybrid Excel table reader with XML discovery + Polars reading.
Best of both worlds approach: - XML-based automatic table name discovery (no hardcoded fallbacks) - Polars native reading for 21x faster performance - Automatic file lock handling - Returns pandas DataFrames for compatibility
Requirements: - Polars and fastexcel packages - Excel files with proper Table objects (Insert > Table)
- VALID_DATA_TABLE_NAMES = ['TestData', 'Base_de_données']#
Valid Name for Data Tables.
Users must name their primary data table one of these names. To edit a Data Table name in Excel see user-attachments/assets
- file_name#
- read_params#
- verbose = 1#
- suppress_polars_warnings = False#
- temp_path = None#
- is_using_temp = False#
- detect_excel_tables()#
Detect Excel Table objects using fast XML parsing.
- Returns:
list
- Return type:
Listofdetected table names
- read_with_table_detection(table_name=None)#
Read Excel table using XML discovery + Polars reading.
Hybrid approach combining the best of both worlds: 1. XML-based table name discovery (automatic, no hardcoded fallbacks) 2. Polars native reading for maximum performance
Requires Excel Table objects (Insert > Table in Excel).
- Args:
table_name: Specific table name to use, or None for auto-discovery
- Returns:
pd.DataFrame
- Return type:
Loaded data converted from Polars- Raises:
ImportError – If Polars is not installed
ValueError – If no Excel Table objects are found in the file
- cleanup()#
Clean up temporary file if it exists.
Note: Temp files are left for potential reuse by other handler instances. They will be cleaned up by the OS temp directory cleanup or when the process exits.
- sda.api.parse_utils.get_excel_core_properties(path)#
Extract core properties from Excel file using direct XML parsing.
This is much faster than loading the entire workbook with openpyxl. Performance improvement: ~57x faster than openpyxl.load_workbook().
- Parameters:
path (
str | Path) – Path to the Excel file.- Returns:
Dictionary with core properties: title, subject, creator, description, keywords, lastModifiedBy.
- Return type:
dict[str,str | None]
Examples
>>> from sda.api.parse_utils import get_excel_core_properties >>> props = get_excel_core_properties("test_file.xlsx") >>> print(f"Keywords: {props['keywords']}") Keywords: STT0.1
Was used previously when STT (Spark Test Template) was used to identify test files.
See also
get_test_template_number()Extract metadata from Excel file properties.
select_parser()Choose parser based on file metadata
- sda.api.parse_utils.normalize_files_input(files)#
Normalize the files input parameter to a consistent dict format.
- Parameters:
files (
str,Path,list, ordict) – Files input in various formats- Returns:
Normalized dict with {filename: command_dict} format
- Return type:
- Raises:
ValueError – If files input is invalid or empty
- sda.api.parse_utils.validate_file_extension(file_name)#
Validate and return the file extension.
- Parameters:
file_name (
str) – File name to validate- Returns:
File extension (e.g., ‘.xlsx’, ‘.csv’)
- Return type:
- Raises:
- sda.api.parse_utils.postprocess_dataframe(df, drop_unnamed_columns=True, dropna=True)#
Apply common post-processing to the concatenated DataFrame.
- sda.api.parse_utils.get_url_in_path(path)#
Scan a directory for .url files and extract the URL from the first one found.
- Parameters:
path (
Path) – The path to the directory to scan.- Returns:
The URL found in the .url file, or None if no .url file is found or if the file is malformed.
- Return type:
str | None
- sda.api.parse_utils.get_plume_url(test_name)#
Get the Plume URL for a given test name.
This function resolves the test name to its local path and then extracts the URL from a .url file within that path.
- Parameters:
test_name (
str) – The name of the test (e.g., “T234”).- Returns:
The Plume URL if found, otherwise None.
- Return type:
str | None
Convert a local path in a Sharepoint-synced site to a Onedrive-shared path.
Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:
"~/Spark Cleantech/6. DATA 2024 S2 - Documents" "~/Spark Cleantech/6. DATA 2025 S1 - Documents" "~/Spark Cleantech/7. 2024 DATA CH5 - Documents"
OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2" "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1" "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"
- Parameters:
path (
Pathorstr) – Input path that may be in SharePoint-synced format- Returns:
Converted path in OneDrive-shared format, or original path if no conversion needed
- Return type:
Path
Examples
>>> from pathlib import Path >>> path = Path("~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder") >>> converted = sharepoint_synced_to_onedrive_shared_syntax(path) >>> str(converted) "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder"
Convert a local path in a OneDrive-shared site to a SharePoint-synced path.
OneDrive-shared paths local path have been obtained by using the “Add a shortcut to my Drive” button from the online Sharepoint interface. Typical names:
"~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2" "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2025 S1" "~/OneDrive - Spark Cleantech/Documents partages - 7. 2024 DATA CH5"
Sharepoint-synced local path have been obtained by using the “Sync” button from the online Sharepoint interface. Typical names:
"~/Spark Cleantech/6. DATA 2024 S2 - Documents" "~/Spark Cleantech/6. DATA 2025 S1 - Documents" "~/Spark Cleantech/7. 2024 DATA CH5 - Documents"
- Parameters:
path (
Pathorstr) – Input path that may be in OneDrive-shared format- Returns:
Converted path in SharePoint-synced format, or original path if no conversion needed
- Return type:
Path
Examples
>>> from pathlib import Path >>> path = Path( ... "~/OneDrive - Spark Cleantech/Documents partages - 6. DATA 2024 S2/subfolder" ... ) >>> converted = onedrive_shared_to_sharepoint_synced_syntax(path) >>> str(converted) "~/Spark Cleantech/6. DATA 2024 S2 - Documents/subfolder"
- sda.api.parse_utils.is_wsl_environment()#
Detect if we’re running in a WSL (Windows Subsystem for Linux) environment.
- Returns:
True if running in WSL, False otherwise
- Return type:
- sda.api.parse_utils.convert_windows_path_to_wsl(path)#
Convert a Windows-style path to WSL mount path format.
This function converts paths from Windows format (e.g., ~/Documents) to WSL mount format (e.g., /mnt/c/Users/username/Documents).
- Parameters:
path (
Path) – Windows-style path to convert- Returns:
WSL mount path, or original path if conversion not needed/possible
- Return type:
Path
Examples
>>> from pathlib import Path >>> path = Path("~/Spark Cleantech/6. DATA 2025 S1 - Documents") >>> converted = convert_windows_path_to_wsl(path) >>> str(converted) "/mnt/c/Users/username/Spark Cleantech/6. DATA 2025 S1 - Documents"
- sda.api.parse_utils.convert_wsl_path_to_windows_if_needed(path)#
Convert WSL paths back to Windows format if needed.
This is a utility function that can be used to convert paths back from WSL format to Windows format when interfacing with Windows applications.
- Parameters:
path (
Path) – Path to potentially convert- Returns:
Converted path or original path if no conversion needed
- Return type:
Path