sda.api.load_sample_file ======================== .. py:module:: sda.api.load_sample_file .. autoapi-nested-parse:: Sample-based Excel file loader. This module handles a non-standard Excel format used for carbon analysis summaries (e.g. CAR008-Suivi analyses). These files have: - One sheet per diagnostic (STSA, OAN, HAP, IAN, MOISTURE, pH, DENSITY, XPS, ...) - One or more rows per sample per sheet (triplicates + average/Moyenne columns) - Heterogeneous headers across sheets The result is a **wide DataFrame** — one row per sample, one column per diagnostic metric — suitable for use with `load_test` and the SDA dashboard. Auto-detection -------------- A file is identified as sample-based when it has no SDA data tables (``TestData`` / ``Base_de_données``) **and** at least one sheet whose first non-empty row contains a column matching any of: ``"sample"``, ``"code ech"``, ``"echantillon"`` (case-insensitive). A sheet is included if its first non-empty row contains such a column. Per-sheet aggregation --------------------- Some sheets have multiple measurement rows per sample (triplicates). When a ``Moyenne`` or ``Average`` column is present, that column value is used directly as the representative value (first non-null occurrence per sample). Otherwise, numeric columns are mean-aggregated across rows sharing the same sample name. Column naming ------------- Metric columns use the sheet header as-is when that header appears on **only one** sheet. If the same header string appears on **multiple** sheets (excluding the sample join key), names are prefixed with ``{sheet}_`` to avoid collisions (e.g. two sheets both named ``Average (%)`` → ``HAP_Average (%)``, ``MOISTURE_Average (%)``). Sample name normalization ------------------------- Leading/trailing whitespace and leading ``#`` characters are stripped (e.g. ``" CB5"`` → ``"CB5"``, ``"#CB1"`` → ``"CB1"``). Functions --------- .. autoapisummary:: sda.api.load_sample_file.detect_sample_based_file sda.api.load_sample_file.load_sample_based_file Module Contents --------------- .. py:function:: detect_sample_based_file(path) Return True if *path* looks like a sample-based Excel analysis file. A file is considered sample-based when it has **at least one sheet** whose first non-empty row contains a column matching a known sample-identifier pattern (``"sample"``, ``"code ech"``, ``"echantillon"``). If the file is locked (open in Excel), a temporary copy is used. :param path: Path to an Excel file (``.xlsx`` / ``.xlsm`` / ``.xls``). :type path: :py:class:`Path` :returns: ``True`` if the file has at least one sample-sheet, ``False`` otherwise. :rtype: :py:class:`bool` .. py:function:: load_sample_based_file(path, verbose = 1) Load a sample-based analysis Excel file into a wide DataFrame. Reads each sheet whose first non-empty row contains a sample-identifier column, aggregates to one row per sample (mean of numeric columns), and merges all sheets on the ``Sample`` column. :param path: Path to the Excel file (e.g. ``CAR008-Sample.xlsx``). :type path: :py:class:`Path | str` :param verbose: Verbosity level. 0 = silent, 1 = basic info, 2 = per-sheet details. :type verbose: :py:class:`int`, *default* ``1`` :returns: Wide DataFrame with one row per sample and one column per diagnostic metric. Headers unique to a single sheet keep their Excel names; headers that appear on multiple sheets get ``{sheet}_`` prefixes. The first column is ``Sample``. :rtype: :py:class:`pd.DataFrame` :raises ValueError: If no sample-bearing sheets are found in the file. .. rubric:: Examples >>> from pathlib import Path >>> from sda.api.load_sample_file import load_sample_based_file >>> df = load_sample_based_file(Path("CAR008-Sample.xlsx")) >>> "Sample" in df.columns and "STSA (m²/g)" in df.columns True