sda.api.load_sample_file#
Sample-based Excel file loader.
This module handles a non-standard Excel format used for carbon analysis summaries (e.g. CAR008-Suivi analyses). These files have: - One sheet per diagnostic (STSA, OAN, HAP, IAN, MOISTURE, pH, DENSITY, XPS, …) - One or more rows per sample per sheet (triplicates + average/Moyenne columns) - Heterogeneous headers across sheets
The result is a wide DataFrame — one row per sample, one column per diagnostic metric — suitable for use with load_test and the SDA dashboard.
Auto-detection#
A file is identified as sample-based when it has no SDA data tables
(TestData / Base_de_données) and at least one sheet whose first
non-empty row contains a column matching any of:
"sample", "code ech", "echantillon" (case-insensitive).
A sheet is included if its first non-empty row contains such a column.
Per-sheet aggregation#
Some sheets have multiple measurement rows per sample (triplicates). When a
Moyenne or Average column is present, that column value is used directly
as the representative value (first non-null occurrence per sample). Otherwise,
numeric columns are mean-aggregated across rows sharing the same sample name.
Column naming#
Metric columns use the sheet header as-is when that header appears on only one
sheet. If the same header string appears on multiple sheets (excluding the
sample join key), names are prefixed with {sheet}_ to avoid collisions
(e.g. two sheets both named Average (%) → HAP_Average (%),
MOISTURE_Average (%)).
Sample name normalization#
Leading/trailing whitespace and leading # characters are stripped
(e.g. " CB5" → "CB5", "#CB1" → "CB1").
Functions#
|
Return True if path looks like a sample-based Excel analysis file. |
|
Load a sample-based analysis Excel file into a wide DataFrame. |
Module Contents#
- sda.api.load_sample_file.detect_sample_based_file(path)#
Return True if path looks like a sample-based Excel analysis file.
A file is considered sample-based when it has at least one sheet whose first non-empty row contains a column matching a known sample-identifier pattern (
"sample","code ech","echantillon").If the file is locked (open in Excel), a temporary copy is used.
- Parameters:
path (
Path) – Path to an Excel file (.xlsx/.xlsm/.xls).- Returns:
Trueif the file has at least one sample-sheet,Falseotherwise.- Return type:
- sda.api.load_sample_file.load_sample_based_file(path, verbose=1)#
Load a sample-based analysis Excel file into a wide DataFrame.
Reads each sheet whose first non-empty row contains a sample-identifier column, aggregates to one row per sample (mean of numeric columns), and merges all sheets on the
Samplecolumn.- Parameters:
path (
Path | str) – Path to the Excel file (e.g.CAR008-Sample.xlsx).verbose (
int, default1) – Verbosity level. 0 = silent, 1 = basic info, 2 = per-sheet details.
- Returns:
Wide DataFrame with one row per sample and one column per diagnostic metric. Headers unique to a single sheet keep their Excel names; headers that appear on multiple sheets get
{sheet}_prefixes. The first column isSample.- Return type:
pd.DataFrame- Raises:
ValueError – If no sample-bearing sheets are found in the file.
Examples
>>> from pathlib import Path >>> from sda.api.load_sample_file import load_sample_based_file >>> df = load_sample_based_file(Path("CAR008-Sample.xlsx")) >>> "Sample" in df.columns and "STSA (m²/g)" in df.columns True