sda.api.load_sample_file#

Sample-based Excel file loader.

This module handles a non-standard Excel format used for carbon analysis summaries (e.g. CAR008-Suivi analyses). These files have: - One sheet per diagnostic (STSA, OAN, HAP, IAN, MOISTURE, pH, DENSITY, XPS, …) - One or more rows per sample per sheet (triplicates + average/Moyenne columns) - Heterogeneous headers across sheets

The result is a wide DataFrame — one row per sample, one column per diagnostic metric — suitable for use with load_test and the SDA dashboard.

Auto-detection#

A file is identified as sample-based when it has no SDA data tables (TestData / Base_de_données) and at least one sheet whose first non-empty row contains a column matching any of: "sample", "code ech", "echantillon" (case-insensitive).

A sheet is included if its first non-empty row contains such a column.

Per-sheet aggregation#

Some sheets have multiple measurement rows per sample (triplicates). When a Moyenne or Average column is present, that column value is used directly as the representative value (first non-null occurrence per sample). Otherwise, numeric columns are mean-aggregated across rows sharing the same sample name.

Column naming#

Metric columns use the sheet header as-is when that header appears on only one sheet. If the same header string appears on multiple sheets (excluding the sample join key), names are prefixed with {sheet}_ to avoid collisions (e.g. two sheets both named Average (%)HAP_Average (%), MOISTURE_Average (%)).

Sample name normalization#

Leading/trailing whitespace and leading # characters are stripped (e.g. " CB5""CB5", "#CB1""CB1").

Functions#

detect_sample_based_file(path)

Return True if path looks like a sample-based Excel analysis file.

load_sample_based_file(path[, verbose])

Load a sample-based analysis Excel file into a wide DataFrame.

Module Contents#

sda.api.load_sample_file.detect_sample_based_file(path)#

Return True if path looks like a sample-based Excel analysis file.

A file is considered sample-based when it has at least one sheet whose first non-empty row contains a column matching a known sample-identifier pattern ("sample", "code ech", "echantillon").

If the file is locked (open in Excel), a temporary copy is used.

Parameters:

path (Path) – Path to an Excel file (.xlsx / .xlsm / .xls).

Returns:

True if the file has at least one sample-sheet, False otherwise.

Return type:

bool

sda.api.load_sample_file.load_sample_based_file(path, verbose=1)#

Load a sample-based analysis Excel file into a wide DataFrame.

Reads each sheet whose first non-empty row contains a sample-identifier column, aggregates to one row per sample (mean of numeric columns), and merges all sheets on the Sample column.

Parameters:
  • path (Path | str) – Path to the Excel file (e.g. CAR008-Sample.xlsx).

  • verbose (int, default 1) – Verbosity level. 0 = silent, 1 = basic info, 2 = per-sheet details.

Returns:

Wide DataFrame with one row per sample and one column per diagnostic metric. Headers unique to a single sheet keep their Excel names; headers that appear on multiple sheets get {sheet}_ prefixes. The first column is Sample.

Return type:

pd.DataFrame

Raises:

ValueError – If no sample-bearing sheets are found in the file.

Examples

>>> from pathlib import Path
>>> from sda.api.load_sample_file import load_sample_based_file
>>> df = load_sample_based_file(Path("CAR008-Sample.xlsx"))
>>> "Sample" in df.columns and "STSA (m²/g)" in df.columns
True