sda.api.load#

Functions#

select_parser(data_file, folder_path)

Select the appropriate parser function for a given data file.

load_test(test_name[, datafilename_filter, ...])

Load data from a specific test by name or path.

parse_files(files[, command, columns_to_keep, ...])

Parse multiple files using the unified parser approach.

load_tests([test_names, include_2021_2022, ...])

Load multiple tests by names or discover all available tests.

Module Contents#

sda.api.load.select_parser(data_file, folder_path)#

Select the appropriate parser function for a given data file.

Unified approach: Always use parse_test_files (supports ExcelFileHandler with XML Discovery + Polars for all files)

See also

parse_test_files()

Unified parser used across all years.

sda.api.load.load_test(test_name, datafilename_filter='*.xls*', data_sharepoint='auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, source='local', force_refresh=False, **kwargs)#

Load data from a specific test by name or path.

This is a high-level function that handles the complete workflow: path resolution, file discovery, and data parsing using the unified parser approach.

Parameters:
  • test_name (str | Path) – Name of the test file to load, ex. T135, or direct path to file.

  • datafilename_filter (str, optional) – Filter to apply to the data files, by default “.xls” (matches .xlsx, .xlsm, .xls).

  • data_sharepoint (str, optional) – Name of data sharepoint where the test is located, by default “auto”.

  • command (dict, optional) – Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.

  • columns_to_keep (list, optional) – List of columns to keep (default: None).

  • verbose (int, default 1) – Verbosity level.

  • column_not_found (str, default 'raise') – What to do if a column is not found.

  • table_not_found (str, default 'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.

  • suppress_polars_warnings (bool, default False) – If True, suppress Polars dtype warning messages during reading.

  • source (str, default :py:class:``”local”:py:class:``) –

    Data access mode.

    • "local" (default) — resolve the test from synced local folders configured in DATA_SHAREPOINTS (~/sda.json), or from a direct file path. This is the standard desktop mode.

    • "cloud" — download the test file on demand from SharePoint via the Microsoft Graph API. Requires authentication (browser login on first use, or AZURE_* environment variables for CI). test_name must be a canonical test ID (e.g. "T297"); direct file paths and CUSTOM_TESTS entries are not supported and raise ValueError.

    • "default" — try local first; if the test is not found locally (raises FileNotFoundError, ValueError, or NotImplementedError), automatically fall back to the cloud via the Microsoft Graph API. Direct file paths are never forwarded to the cloud.

  • force_refresh (bool, default False) – Only meaningful when source="cloud" or source="default" (when the cloud fallback is triggered). If True, delete the local cache for test_name and re-download from SharePoint even if a cached copy exists.

  • **kwargs – Additional arguments passed to the parser function.

Returns:

pandas.DataFrame containing the test data with automatic table detection.

Return type:

pandas.DataFrame

Examples

Load by test name (local, default):

>>> from sda.api.load import load_test
>>> df = load_test("T183")

Load via Microsoft Graph API (web app / CI):

>>> df = load_test("T297", source="cloud")

Load with automatic local-first, cloud-fallback:

>>> df = load_test("T297", source="default")

Force re-download from SharePoint:

>>> df = load_test("T297", source="cloud", force_refresh=True)

Load with specific filter:

>>> df = load_test("T183", datafilename_filter="*_processed.xlsx")

Load from direct path:

>>> df = load_test("/path/to/data/T183_experiment.xlsx")

See also

resolve_local_test_path()

Resolve a test name or path to the canonical folder path.

discover_data_file()

Find a data file in the resolved test folder using a filename pattern.

download_test_file_via_graph()

Graph API download used when source="cloud".

parse_test_files()

Unified parser for all supported years (2021+).

parse_files()

Parse one or many files directly when you already know the paths.

sda.api.load.parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs)#

Parse multiple files using the unified parser approach.

Since all files now use the same unified parser (parse_test_files with ExcelFileHandler), this function directly calls that parser without any grouping logic.

Parameters:
  • files (str, Path, list, or dict) – Files to parse. Same format as parse_test_files (unified approach).

  • command (dict, optional) – Default command for reading files. If None, will use parser-specific defaults.

  • columns_to_keep (list, optional) – List of columns to keep (only applies to 2023+ data).

  • verbose (int, default 1) – Verbosity level.

  • column_not_found (str, default 'warn') – What to do if a column is not found.

  • table_not_found (str, default 'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.

  • **kwargs – Additional arguments passed to parser functions.

Returns:

pandas.DataFrame combined from all files with automatic table detection.

Return type:

pandas.DataFrame

Examples

Parse a single file:

>>> from sda.api.load import parse_files
>>> df = parse_files("T183.xlsx")

Parse multiple files:

>>> files = ["T183.xlsx", "T196.xlsx"]
>>> df = parse_files(files)

Parse with legacy command format:

>>> files = {"data.xlsx": {}}  # Empty dict uses table detection
>>> df = parse_files(files)

See also

parse_test_files()

Unified parser function used under the hood.

list_all_files()

Discover eligible files to parse inside a test folder.

load_test()

High-level convenience wrapper to load a single test by name.

sda.api.load.load_tests(test_names=None, include_2021_2022=True, include_2023_current=True, source='local', force_refresh=False)#

Load multiple tests by names or discover all available tests.

This function provides a convenient way to load multiple test datasets at once, with automatic discovery if no specific test names are provided.

Parameters:
  • test_names (str | list[str], optional) –

    Test name, wildcard pattern, or list thereof. Examples: ["T183", "T196"], "T3*", ["T1*", "T297"].

    Wildcard characters (*, ?, [) trigger automatic expansion:

    • source="cloud" — expanded by querying SharePoint directly via the Graph API (list_tests_via_graph()). No local filesystem access is required.

    • source="local" — expanded by scanning synced local folders (list_all_tests()).

    • source="default" — union of local and cloud expansions; tests present in both are deduplicated (local ordering preserved).

    If None, discovers all available tests from local synced folders (only valid when source is "local" or "default").

  • include_2021_2022 (bool, default True) – Whether to include 2021-2022 test data in discovery.

  • include_2023_current (bool, default True) – Whether to include 2023-current test data in discovery.

  • source (str, default :py:class:``”local”:py:class:``) –

    Data access mode, passed through to load_test() for each test.

    • "local" — load each test from synced local folders only.

    • "cloud" — download each test via the Microsoft Graph API; wildcard patterns are resolved against SharePoint directly.

    • "default" — try local first for each test, fall back to the cloud if not found locally. Wildcard expansion takes the union of local and cloud results.

  • force_refresh (bool, default False) – Only meaningful when source="cloud" or source="default" (when the cloud fallback is triggered). Re-download each test even if a cached copy exists.

Returns:

pandas.DataFrame combined from all loaded tests with automatic table detection. When more than one test name is loaded, each row has a Test column with the source test name (same convention as the dashboard). A single test behaves like load_test() (no synthetic Test column). Legacy per-file test columns are renamed to Test only in that multi-test case.

Return type:

pandas.DataFrame

Examples

Load specific tests:

>>> from sda.api.load import load_tests
>>> df = load_tests(["T183", "T196"])

Load tests matching a wildcard (local filesystem):

>>> df = load_tests("T3*")

Load tests matching a wildcard directly from SharePoint (no local sync needed):

>>> df = load_tests("T3*", source="cloud")

Load tests with local-first, cloud-fallback (union of local + cloud for wildcards):

>>> df = load_tests("T3*", source="default")

Load all available tests:

>>> df = load_tests()  # Discovers and loads all tests

Load only recent tests:

>>> df = load_tests(include_2021_2022=False)

See also

list_tests_via_graph()

Cloud-native test discovery via Graph API (used for wildcard expansion when source="cloud").

list_all_tests()

Discover available test identifiers from local synced folders.

load_test()

Load a single test.

parse_files()

Parse multiple files directly.