sda.api.load_sample_file
========================

.. py:module:: sda.api.load_sample_file

.. autoapi-nested-parse::

   Sample-based Excel file loader.

   This module handles a non-standard Excel format used for carbon analysis summaries
   (e.g. CAR008-Suivi analyses). These files have:
   - One sheet per diagnostic (STSA, OAN, HAP, IAN, MOISTURE, pH, DENSITY, XPS, ...)
   - One or more rows per sample per sheet (triplicates + average/Moyenne columns)
   - Heterogeneous headers across sheets

   The result is a **wide DataFrame** — one row per sample, one column per diagnostic
   metric — suitable for use with `load_test` and the SDA dashboard.

   Auto-detection
   --------------
   A file is identified as sample-based when it has no SDA data tables
   (``TestData`` / ``Base_de_données``) **and** at least one sheet whose first
   non-empty row contains a column matching any of:
   ``"sample"``, ``"code ech"``, ``"echantillon"`` (case-insensitive).

   A sheet is included if its first non-empty row contains such a column.

   Per-sheet aggregation
   ---------------------
   Some sheets have multiple measurement rows per sample (triplicates). When a
   ``Moyenne`` or ``Average`` column is present, that column value is used directly
   as the representative value (first non-null occurrence per sample). Otherwise,
   numeric columns are mean-aggregated across rows sharing the same sample name.

   Column naming
   -------------
   Metric columns use the sheet header as-is when that header appears on **only one**
   sheet. If the same header string appears on **multiple** sheets (excluding the
   sample join key), names are prefixed with ``{sheet}_`` to avoid collisions
   (e.g. two sheets both named ``Average (%)`` → ``HAP_Average (%)``,
   ``MOISTURE_Average (%)``).

   Sample name normalization
   -------------------------
   Leading/trailing whitespace and leading ``#`` characters are stripped
   (e.g. ``" CB5"`` → ``"CB5"``, ``"#CB1"`` → ``"CB1"``).


Functions
---------

.. autoapisummary::

   sda.api.load_sample_file.detect_sample_based_file
   sda.api.load_sample_file.load_sample_based_file


Module Contents
---------------

.. py:function:: detect_sample_based_file(path)

   Return True if *path* looks like a sample-based Excel analysis file.

   A file is considered sample-based when it has **at least one sheet** whose
   first non-empty row contains a column matching a known sample-identifier
   pattern (``"sample"``, ``"code ech"``, ``"echantillon"``).

   If the file is locked (open in Excel), a temporary copy is used.

   :param path: Path to an Excel file (``.xlsx`` / ``.xlsm`` / ``.xls``).
   :type path: :py:class:`Path`

   :returns: ``True`` if the file has at least one sample-sheet, ``False`` otherwise.
   :rtype: :py:class:`bool`


.. py:function:: load_sample_based_file(path, verbose = 1)

   Load a sample-based analysis Excel file into a wide DataFrame.

   Reads each sheet whose first non-empty row contains a sample-identifier
   column, aggregates to one row per sample (mean of numeric columns), and
   merges all sheets on the ``Sample`` column.

   :param path: Path to the Excel file (e.g. ``CAR008-Sample.xlsx``).
   :type path: :py:class:`Path | str`
   :param verbose: Verbosity level. 0 = silent, 1 = basic info, 2 = per-sheet details.
   :type verbose: :py:class:`int`, *default* ``1``

   :returns: Wide DataFrame with one row per sample and one column per diagnostic
             metric. Headers unique to a single sheet keep their Excel names; headers
             that appear on multiple sheets get ``{sheet}_`` prefixes. The first column
             is ``Sample``.
   :rtype: :py:class:`pd.DataFrame`

   :raises ValueError: If no sample-bearing sheets are found in the file.

   .. rubric:: Examples

   >>> from pathlib import Path
   >>> from sda.api.load_sample_file import load_sample_based_file
   >>> df = load_sample_based_file(Path("CAR008-Sample.xlsx"))
   >>> "Sample" in df.columns and "STSA (m²/g)" in df.columns
   True