sda.api.load#
Functions#
|
Select the appropriate parser function for a given data file. |
|
Load data from a specific test by name or path. |
|
Parse multiple files using the unified parser approach. |
|
Load multiple tests by names or discover all available tests. |
Module Contents#
- sda.api.load.select_parser(data_file, folder_path)#
Select the appropriate parser function for a given data file.
Unified approach: Always use parse_test_files (supports ExcelFileHandler with XML Discovery + Polars for all files)
See also
parse_test_files()Unified parser used across all years.
- sda.api.load.load_test(test_name, datafilename_filter='*.xls*', data_sharepoint='auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, source='local', force_refresh=False, **kwargs)#
Load data from a specific test by name or path.
This is a high-level function that handles the complete workflow: path resolution, file discovery, and data parsing using the unified parser approach.
- Parameters:
test_name (
str | Path) – Name of the test file to load, ex.T135, or direct path to file.datafilename_filter (
str, optional) – Filter to apply to the data files, by default “.xls” (matches .xlsx, .xlsm, .xls).data_sharepoint (
str, optional) – Name of data sharepoint where the test is located, by default “auto”.command (
dict, optional) – Legacy reading commands. Now largely ignored in favor of automatic Excel table detection.columns_to_keep (
list, optional) – List of columns to keep (default: None).verbose (
int, default1) – Verbosity level.column_not_found (
str, default'raise') – What to do if a column is not found.table_not_found (
str, default'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.suppress_polars_warnings (
bool, defaultFalse) – If True, suppress Polars dtype warning messages during reading.source (
str, default :py:class:``”local”:py:class:``) –Data access mode.
"local"(default) — resolve the test from synced local folders configured inDATA_SHAREPOINTS(~/sda.json), or from a direct file path. This is the standard desktop mode."cloud"— download the test file on demand from SharePoint via the Microsoft Graph API. Requires authentication (browser login on first use, orAZURE_*environment variables for CI).test_namemust be a canonical test ID (e.g."T297"); direct file paths andCUSTOM_TESTSentries are not supported and raiseValueError."default"— try local first; if the test is not found locally (raisesFileNotFoundError,ValueError, orNotImplementedError), automatically fall back to the cloud via the Microsoft Graph API. Direct file paths are never forwarded to the cloud.
force_refresh (
bool, defaultFalse) – Only meaningful whensource="cloud"orsource="default"(when the cloud fallback is triggered). IfTrue, delete the local cache fortest_nameand re-download from SharePoint even if a cached copy exists.**kwargs – Additional arguments passed to the parser function.
- Returns:
pandas.DataFramecontaining the test data with automatic table detection.- Return type:
Examples
Load by test name (local, default):
>>> from sda.api.load import load_test >>> df = load_test("T183")
Load via Microsoft Graph API (web app / CI):
>>> df = load_test("T297", source="cloud")
Load with automatic local-first, cloud-fallback:
>>> df = load_test("T297", source="default")
Force re-download from SharePoint:
>>> df = load_test("T297", source="cloud", force_refresh=True)
Load with specific filter:
>>> df = load_test("T183", datafilename_filter="*_processed.xlsx")
Load from direct path:
>>> df = load_test("/path/to/data/T183_experiment.xlsx")
See also
resolve_local_test_path()Resolve a test name or path to the canonical folder path.
discover_data_file()Find a data file in the resolved test folder using a filename pattern.
download_test_file_via_graph()Graph API download used when
source="cloud".parse_test_files()Unified parser for all supported years (2021+).
parse_files()Parse one or many files directly when you already know the paths.
- sda.api.load.parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs)#
Parse multiple files using the unified parser approach.
Since all files now use the same unified parser (parse_test_files with ExcelFileHandler), this function directly calls that parser without any grouping logic.
- Parameters:
files (
str,Path,list, ordict) – Files to parse. Same format as parse_test_files (unified approach).command (
dict, optional) – Default command for reading files. If None, will use parser-specific defaults.columns_to_keep (
list, optional) – List of columns to keep (only applies to 2023+ data).verbose (
int, default1) – Verbosity level.column_not_found (
str, default'warn') – What to do if a column is not found.table_not_found (
str, default'raise') – What to do if no Excel tables are found in a file. Can be ‘raise’ or ‘warn’.**kwargs – Additional arguments passed to parser functions.
- Returns:
pandas.DataFramecombined from all files with automatic table detection.- Return type:
Examples
Parse a single file:
>>> from sda.api.load import parse_files >>> df = parse_files("T183.xlsx")
Parse multiple files:
>>> files = ["T183.xlsx", "T196.xlsx"] >>> df = parse_files(files)
Parse with legacy command format:
>>> files = {"data.xlsx": {}} # Empty dict uses table detection >>> df = parse_files(files)
See also
parse_test_files()Unified parser function used under the hood.
list_all_files()Discover eligible files to parse inside a test folder.
load_test()High-level convenience wrapper to load a single test by name.
- sda.api.load.load_tests(test_names=None, include_2021_2022=True, include_2023_current=True, source='local', force_refresh=False)#
Load multiple tests by names or discover all available tests.
This function provides a convenient way to load multiple test datasets at once, with automatic discovery if no specific test names are provided.
- Parameters:
test_names (
str | list[str], optional) –Test name, wildcard pattern, or list thereof. Examples:
["T183", "T196"],"T3*",["T1*", "T297"].Wildcard characters (
*,?,[) trigger automatic expansion:source="cloud"— expanded by querying SharePoint directly via the Graph API (list_tests_via_graph()). No local filesystem access is required.source="local"— expanded by scanning synced local folders (list_all_tests()).source="default"— union of local and cloud expansions; tests present in both are deduplicated (local ordering preserved).
If
None, discovers all available tests from local synced folders (only valid whensourceis"local"or"default").include_2021_2022 (
bool, defaultTrue) – Whether to include 2021-2022 test data in discovery.include_2023_current (
bool, defaultTrue) – Whether to include 2023-current test data in discovery.source (
str, default :py:class:``”local”:py:class:``) –Data access mode, passed through to
load_test()for each test."local"— load each test from synced local folders only."cloud"— download each test via the Microsoft Graph API; wildcard patterns are resolved against SharePoint directly."default"— try local first for each test, fall back to the cloud if not found locally. Wildcard expansion takes the union of local and cloud results.
force_refresh (
bool, defaultFalse) – Only meaningful whensource="cloud"orsource="default"(when the cloud fallback is triggered). Re-download each test even if a cached copy exists.
- Returns:
pandas.DataFramecombined from all loaded tests with automatic table detection. When more than one test name is loaded, each row has aTestcolumn with the source test name (same convention as the dashboard). A single test behaves likeload_test()(no syntheticTestcolumn). Legacy per-filetestcolumns are renamed toTestonly in that multi-test case.- Return type:
Examples
Load specific tests:
>>> from sda.api.load import load_tests >>> df = load_tests(["T183", "T196"])
Load tests matching a wildcard (local filesystem):
>>> df = load_tests("T3*")
Load tests matching a wildcard directly from SharePoint (no local sync needed):
>>> df = load_tests("T3*", source="cloud")
Load tests with local-first, cloud-fallback (union of local + cloud for wildcards):
>>> df = load_tests("T3*", source="default")
Load all available tests:
>>> df = load_tests() # Discovers and loads all tests
Load only recent tests:
>>> df = load_tests(include_2021_2022=False)
See also
list_tests_via_graph()Cloud-native test discovery via Graph API (used for wildcard expansion when
source="cloud").list_all_tests()Discover available test identifiers from local synced folders.
load_test()Load a single test.
parse_files()Parse multiple files directly.