sda.api.load ============ .. py:module:: sda.api.load Functions --------- .. autoapisummary:: sda.api.load.select_parser sda.api.load.load_test sda.api.load.parse_files sda.api.load.load_tests Module Contents --------------- .. py:function:: select_parser(data_file, folder_path) Select the appropriate parser function for a given data file. Unified approach: Always use parse_test_files (supports ExcelFileHandler with XML Discovery + Polars for all files) .. seealso:: :py:func:`~sda.api.parse_test_file.parse_test_files` Unified parser used across all years. .. py:function:: load_test(test_name, datafilename_filter = '*.xls*', data_sharepoint = 'auto', command=None, columns_to_keep=None, verbose=1, column_not_found='raise', table_not_found='raise', suppress_polars_warnings=False, **kwargs) Load data from a specific test by name or path. This is a high-level function that handles the complete workflow: path resolution, file discovery, and data parsing using the unified parser approach. :param test_name: Name of the test file to load, ex. ``T135``, or direct path to file. :type test_name: :py:class:`str | Path` :param datafilename_filter: Filter to apply to the data files, by default "*.xls*" (matches .xlsx, .xlsm, .xls). :type datafilename_filter: :py:class:`str`, *optional* :param data_sharepoint: Name of data sharepoint where the test is located, by default "auto". :type data_sharepoint: :py:class:`str`, *optional* :param command: Legacy reading commands. Now largely ignored in favor of automatic Excel table detection. :type command: :py:class:`dict`, *optional* :param columns_to_keep: List of columns to keep (default: None). :type columns_to_keep: :py:class:`list`, *optional* :param verbose: Verbosity level. :type verbose: :py:class:`int`, *default* ``1`` :param column_not_found: What to do if a column is not found. :type column_not_found: :py:class:`str`, *default* ``'raise'`` :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'. :type table_not_found: :py:class:`str`, *default* ``'raise'`` :param suppress_polars_warnings: If True, suppress Polars dtype warning messages during reading. :type suppress_polars_warnings: :py:class:`bool`, *default* :py:obj:`False` :param \*\*kwargs: Additional arguments passed to the parser function. :returns: :class:`pandas.DataFrame` containing the test data with automatic table detection. :rtype: :py:class:`pandas.DataFrame` .. rubric:: Examples Load by test name: >>> from sda.api.load import load_test >>> df = load_test("T183") Load with specific filter: >>> df = load_test("T183", datafilename_filter="*_processed.xlsx") Load from direct path: >>> df = load_test("/path/to/data/T183_experiment.xlsx") .. seealso:: :py:func:`~sda.api.file_discovery.resolve_test_path` Resolve a test name or path to the canonical folder path. :py:func:`~sda.api.file_discovery.discover_data_file` Find a data file in the resolved test folder using a filename pattern. :py:func:`~sda.api.parse_test_file.parse_test_files` Unified parser for all supported years (2021+). :py:func:`~sda.api.load.parse_files` Parse one or many files directly when you already know the paths. .. py:function:: parse_files(files, command=None, columns_to_keep=None, verbose=1, column_not_found='warn', table_not_found='raise', **kwargs) Parse multiple files using the unified parser approach. Since all files now use the same unified parser (parse_test_files with ExcelFileHandler), this function directly calls that parser without any grouping logic. :param files: Files to parse. Same format as parse_test_files (unified approach). :type files: :py:class:`str`, :py:class:`Path`, :py:class:`list`, or :py:class:`dict` :param command: Default command for reading files. If None, will use parser-specific defaults. :type command: :py:class:`dict`, *optional* :param columns_to_keep: List of columns to keep (only applies to 2023+ data). :type columns_to_keep: :py:class:`list`, *optional* :param verbose: Verbosity level. :type verbose: :py:class:`int`, *default* ``1`` :param column_not_found: What to do if a column is not found. :type column_not_found: :py:class:`str`, *default* ``'warn'`` :param table_not_found: What to do if no Excel tables are found in a file. Can be 'raise' or 'warn'. :type table_not_found: :py:class:`str`, *default* ``'raise'`` :param \*\*kwargs: Additional arguments passed to parser functions. :returns: :class:`pandas.DataFrame` combined from all files with automatic table detection. :rtype: :py:class:`pandas.DataFrame` .. rubric:: Examples Parse a single file: >>> from sda.api.load import parse_files >>> df = parse_files("T183.xlsx") Parse multiple files: >>> files = ["T183.xlsx", "T196.xlsx"] >>> df = parse_files(files) Parse with legacy command format: >>> files = {"data.xlsx": {}} # Empty dict uses table detection >>> df = parse_files(files) .. seealso:: :py:func:`~sda.api.parse_test_file.parse_test_files` Unified parser function used under the hood. :py:func:`~sda.api.file_discovery.list_all_files` Discover eligible files to parse inside a test folder. :py:func:`~sda.api.load.load_test` High-level convenience wrapper to load a single test by name. .. py:function:: load_tests(test_names=None, include_2021_2022=True, include_2023_current=True) Load multiple tests by names or discover all available tests. This function provides a convenient way to load multiple test datasets at once, with automatic discovery if no specific test names are provided. :param test_names: List of test names to load. If None, discovers all available tests. :type test_names: :py:class:`list`, *optional* :param include_2021_2022: Whether to include 2021-2022 test data in discovery. :type include_2021_2022: :py:class:`bool`, *default* :py:obj:`True` :param include_2023_current: Whether to include 2023-current test data in discovery. :type include_2023_current: :py:class:`bool`, *default* :py:obj:`True` :returns: :class:`pandas.DataFrame` combined from all loaded tests with automatic table detection. :rtype: :py:class:`pandas.DataFrame` .. rubric:: Examples Load specific tests: >>> from sda.api.load import load_tests >>> df = load_tests(["T183", "T196"]) Load all available tests: >>> df = load_tests() # Discovers and loads all tests Load only recent tests: >>> df = load_tests(include_2021_2022=False) .. seealso:: :py:func:`~sda.api.file_discovery.list_all_tests` Discover available test identifiers. :py:func:`~sda.api.load.load_test` Load a single test. :py:func:`~sda.api.load.parse_files` Parse multiple files directly.