sda.api.load#

Functions#

select_parser(data_file, folder_path)

Select the appropriate parser function for a given data file.

load_test(test_name[, datafilename_filter, ...])

Load data from a specific test by name or path.

parse_files(files[, command, columns_to_keep, ...])

Parse multiple files using the unified parser approach.

load_tests([test_names, include_2021_2022, ...])

Load multiple tests by names or discover all available tests.

Module Contents#

sda.api.load.select_parser(data_file, folder_path)#

Select the appropriate parser function for a given data file.

Unified approach: Always use parse_test_files (supports ExcelFileHandler with XML Discovery + Polars for all files)

See also

parse_test_files()

Unified parser used across all years.

sda.api.load.load_test(test_name, datafilename_filter='*.xls*', data_sharepoint='auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, **kwargs)#

Load data from a specific test by name or path.

This is a high-level function that handles the complete workflow: path resolution, file discovery, and data parsing using the unified parser approach.

Parameters:
  • test_name (str | Path) – Name of the test file to load, ex. T135, or direct path to file.

  • datafilename_filter (str, optional) – Filter to apply to the data files, by default “.xls” (matches .xlsx, .xlsm, .xls).

  • data_sharepoint (str, optional) – Name of data sharepoint where the test is located, by default “auto”.

  • command (dict, optional) – Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.

  • columns_to_keep (list, optional) – List of columns to keep (default: None).

  • verbose (int, default 1) – Verbosity level.

  • column_not_found (str, default 'raise') – What to do if a column is not found.

  • table_not_found (str, default 'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.

  • suppress_polars_warnings (bool, default False) – If True, suppress Polars dtype warning messages during reading.

  • **kwargs – Additional arguments passed to the parser function.

Returns:

pandas.DataFrame containing the test data with automatic table detection.

Return type:

pandas.DataFrame

Examples

Load by test name:

>>> from sda.api.load import load_test
>>> df = load_test("T183")

Load with specific filter:

>>> df = load_test("T183", datafilename_filter="*_processed.xlsx")

Load from direct path:

>>> df = load_test("/path/to/data/T183_experiment.xlsx")

See also

resolve_test_path()

Resolve a test name or path to the canonical folder path.

discover_data_file()

Find a data file in the resolved test folder using a filename pattern.

parse_test_files()

Unified parser for all supported years (2021+).

parse_files()

Parse one or many files directly when you already know the paths.

sda.api.load.parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs)#

Parse multiple files using the unified parser approach.

Since all files now use the same unified parser (parse_test_files with ExcelFileHandler), this function directly calls that parser without any grouping logic.

Parameters:
  • files (str, Path, list, or dict) – Files to parse. Same format as parse_test_files (unified approach).

  • command (dict, optional) – Default command for reading files. If None, will use parser-specific defaults.

  • columns_to_keep (list, optional) – List of columns to keep (only applies to 2023+ data).

  • verbose (int, default 1) – Verbosity level.

  • column_not_found (str, default 'warn') – What to do if a column is not found.

  • table_not_found (str, default 'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.

  • **kwargs – Additional arguments passed to parser functions.

Returns:

pandas.DataFrame combined from all files with automatic table detection.

Return type:

pandas.DataFrame

Examples

Parse a single file:

>>> from sda.api.load import parse_files
>>> df = parse_files("T183.xlsx")

Parse multiple files:

>>> files = ["T183.xlsx", "T196.xlsx"]
>>> df = parse_files(files)

Parse with legacy command format:

>>> files = {"data.xlsx": {}}  # Empty dict uses table detection
>>> df = parse_files(files)

See also

parse_test_files()

Unified parser function used under the hood.

list_all_files()

Discover eligible files to parse inside a test folder.

load_test()

High-level convenience wrapper to load a single test by name.

sda.api.load.load_tests(test_names=None, include_2021_2022=True, include_2023_current=True)#

Load multiple tests by names or discover all available tests.

This function provides a convenient way to load multiple test datasets at once, with automatic discovery if no specific test names are provided.

Parameters:
  • test_names (list, optional) – List of test names to load. If None, discovers all available tests.

  • include_2021_2022 (bool, default True) – Whether to include 2021-2022 test data in discovery.

  • include_2023_current (bool, default True) – Whether to include 2023-current test data in discovery.

Returns:

pandas.DataFrame combined from all loaded tests with automatic table detection.

Return type:

pandas.DataFrame

Examples

Load specific tests:

>>> from sda.api.load import load_tests
>>> df = load_tests(["T183", "T196"])

Load all available tests:

>>> df = load_tests()  # Discovers and loads all tests

Load only recent tests:

>>> df = load_tests(include_2021_2022=False)

See also

list_all_tests()

Discover available test identifiers.

load_test()

Load a single test.

parse_files()

Parse multiple files directly.