Discharge Homogeneity Analysis — data pipeline and figure builder.#

Investigates plasma/discharge homogeneity across lab and pilot tests. Lensing (focussed arc) is the opposite of homogeneity.

Two data sources are combined in a unified main scatter (Energy × Frequency):

  1. Lab generator sweep tests (T324, T339, T341, T343, T344, T345): Standard SDA test files. Homogeneity score derived from:

    • “effet lensing” column (oui → 0, non → 0.75)

    • “zone préférentielle de claquage” column as fallback (arcs don’t cover all surface → 0; cover all surface → 1)

    Shape: gap (mm).

  2. T346 Pilot campaign log (T346_analyse_EP.xlsx, shown as “T346_sup”): Multi-reactor field log from January 2026. Homogeneity from “Plasma homogène” (Non → 0, Partiel → 0.5, Oui → 1). Energy inferred via E=k·V². Shape: ▲ (generator: “EP”). A second scatter (bottom) shows operating conditions: Flow × CH4%.

Both charts use the RdYlGn colorscale:

red=0 (lensing / non-homogeneous) → green=1 (fully homogeneous) 0.75 = no lensing observed (good but not confirmed homogeneous)

Public API#

build_dataframes() -> tuple[pd.DataFrame, pd.DataFrame]

Run the full pipeline and return (df_all, df_ep). df_all : main Energy×Frequency scatter data (lab + T346_sup rows). df_ep : T346 pilot-campaign log (Flow×CH4% scatter).

build_lab_figure(df_all, selected_tests, selected_generators, selected_gaps)

Build the main Energy×Frequency Plotly figure.

build_ep_figure(df_ep, selected_reactors, selected_points)

Build the T346 pilot Flow×CH4% Plotly figure.

Standalone use#

Run directly to build both DataFrames and display both Plotly charts:

python examples/lensing_analysis.py

For the full Dash dashboard with live filter controls:

python examples/lensing_analysis_dash.py

Imports#

import warnings

import numpy as np
import pandas as pd

import sda
from sda.analysis import (
    ENERGY_COLS,
    FREQ_COLS,
    GAP_COLS,
    K_DEFAULT,
    VOLTAGE_COLS,
    background_heatmap_trace,
    concat_notes,
    first_value,
    get_row_generator,
    iso_power_traces,
    load_supplementary_xlsx,
    parse_plasma_homogene,
)

warnings.filterwarnings("ignore")

Configuration#

TESTS = ["T324", "T339", "T341", "T343", "T344", "T345", "T364"]

LENSING_COLS = ["effet lensing"]

HOMOG_COLS = ["Homogénéité de la décharge"]

ZONE_COLS = [
    "zone préférentielle de claquage",
    "Zone préférentielle de claquage",
    "Zone préférentiel de claquage",
    "zone préférentiel de claquage",
]

# Gap symbol mapping: gap value (mm) → Plotly marker symbol
# "—" is used for EP/pilot rows that have no gap setting
_GAP_SYMBOL: dict[str, str] = {
    "2": "diamond",
    "3": "circle",
    "5": "square",
    "—": "triangle-up",
}
_GAP_SYMBOL_DEFAULT = "cross"

# EP reactor → Plotly marker symbol
_REACTOR_SYMBOL: dict[str, str] = {
    "R1": "circle",
    "R2": "square",
    "R4": "diamond",
    "R5": "triangle-up",
}
_REACTOR_SYMBOL_DEFAULT = "cross"

Homogeneity scorers#

Convention: 0 = lensing / non-homogeneous (red), 1 = fully homogeneous (green) 0.75 = “no lensing detected” (good, but not explicitly confirmed as homogeneous)

def _parse_lensing_to_homogeneity(value) -> float:
    """Convert raw 'effet lensing' cell to homogeneity score.

    oui (lensing present)  → 0.0   (bad — focused arc, non-homogeneous)
    non (no lensing)       → 0.75  (good — no lensing observed)
    """
    if value is None:
        return np.nan
    try:
        if pd.isna(value):
            return np.nan
    except (TypeError, ValueError):
        pass
    s = str(value).strip().strip("'\"").lower()
    if s in ("oui", "yes", "y", "oui "):
        return 0.0
    if s in ("non", "no", "n", "none"):
        return 0.75
    return np.nan


def _parse_zone_claquage(value) -> float:
    """Convert 'zone préférentielle de claquage' text to homogeneity score.

    Lensing pattern (arcs don't cover all electrode surface) → 0.0
    Homogeneous pattern (arcs cover all electrode surface)   → 1.0
    """
    if value is None:
        return np.nan
    try:
        if pd.isna(value):
            return np.nan
    except (TypeError, ValueError):
        pass
    s = str(value).strip().lower()
    if not s or s in ("nan", "none", ""):
        return np.nan
    # Non-homogeneous: arcs focused, don't cover full surface
    if "n'occupent pas" in s or "pas toute" in s or "focalis" in s or "lensing" in s:
        return 0.0
    # Homogeneous: arcs cover the full electrode surface
    if "occupent toute" in s or "toute la surface" in s or "homog" in s:
        return 1.0
    return np.nan

Step 1 + 2: Main data loading pipeline#

def build_dataframes() -> tuple[pd.DataFrame, pd.DataFrame]:
    """Run the full lensing pipeline and return (df_all, df_ep).

    Steps performed:
    1. Load each test in TESTS and score rows for discharge homogeneity.
    2. Load T346 pilot campaign log (T346_analyse_EP.xlsx) into df_ep.
    3. Merge T346_sup rows (with energy+frequency) into df_all.

    Returns
    -------
    df_all : pd.DataFrame
        One row per run/condition.  Columns:
        test, run, generator, energy_per_pulse_mJ, frequency_kHz, gap_mm,
        source_raw, homogeneity, discharge_homogeneity, preferred_zone,
        description, analysis_note.
        Includes T346_sup rows appended from the EP log.

    df_ep : pd.DataFrame
        T346 pilot campaign log.  Columns:
        date, point_test, reactor, flow_slm, ch4_pct, h2_pct,
        plasma_homogene_raw, homogeneity, pdc_actifs, notes,
        lensing_in_notes, energy_per_pulse_mJ, frequency_kHz, analysis_note.
    """
    # ── Step 1: Load standard lab tests ──────────────────────────────────────
    print("\nLoading lab tests…")
    all_rows: list[dict] = []

    for test_name in TESTS:
        print(f"  {test_name}…", end=" ", flush=True)
        try:
            df = sda.load_test(test_name)
        except Exception as exc:
            print(f"ERROR: {exc}")
            continue

        df_cols = list(df.columns)
        ecol = next((c for c in ENERGY_COLS if c in df_cols), None)
        vcol = next((c for c in VOLTAGE_COLS if c in df_cols), None)
        fcol = next((c for c in FREQ_COLS if c in df_cols), None)
        gcol = next((c for c in GAP_COLS if c in df_cols), None)
        lcol = next((c for c in LENSING_COLS if c in df_cols), None)
        zcol = next((c for c in ZONE_COLS if c in df_cols), None)
        run_col = next((c for c in ["run", "Run"] if c in df_cols), None)

        has_homog_source = lcol is not None or zcol is not None
        if not has_homog_source and vcol is None and ecol is None:
            print("no homogeneity/lensing or energy/voltage column — skipped")
            continue
        if not has_homog_source:
            print(
                "no homogeneity column (energy/voltage present — position only)…",
                end=" ",
                flush=True,
            )

        row_count = 0
        for _, row in df.iterrows():
            energy_val = np.nan
            analysis_note = ""
            if ecol:
                ev = pd.to_numeric(row.get(ecol), errors="coerce")
                energy_val = float(ev) if pd.notna(ev) else np.nan

            generator = get_row_generator(row, df_cols, test_name, df)

            # Infer energy from voltage when no direct energy column exists
            if np.isnan(energy_val) and vcol:
                vv = pd.to_numeric(row.get(vcol), errors="coerce")
                if pd.notna(vv) and float(vv) > 0:
                    v = float(vv)
                    energy_val = K_DEFAULT * v**2
                    analysis_note = (
                        f"Energy inferred: E=k·V² (k={K_DEFAULT} mJ/kV², V={v:.1f} kV)"
                    )

            freq_val = np.nan
            if fcol:
                fv = pd.to_numeric(row.get(fcol), errors="coerce")
                freq_val = float(fv) if pd.notna(fv) else np.nan

            gap_val = None
            if gcol:
                gv = pd.to_numeric(row.get(gcol), errors="coerce")
                if pd.notna(gv):
                    gap_val = str(int(gv)) if gv == int(gv) else str(gv)

            # Primary homogeneity source: "effet lensing" column
            raw_source = row.get(lcol) if lcol else None
            homogeneity = _parse_lensing_to_homogeneity(raw_source)

            # Fallback: "zone préférentielle de claquage" column
            zone_val = row.get(zcol) if zcol else None
            if np.isnan(homogeneity) and zcol:
                homogeneity = _parse_zone_claquage(zone_val)
                if not np.isnan(homogeneity):
                    raw_source = zone_val  # use zone text as the display source

            run_val = None
            if run_col:
                rv = row.get(run_col)
                if pd.notna(rv):
                    try:
                        run_val = str(int(float(rv)))
                    except (ValueError, TypeError):
                        run_val = str(rv)

            description = concat_notes(row, df_cols)
            discharge_hom = first_value(row, df_cols, HOMOG_COLS)
            preferred_zone = (
                zone_val if zone_val is not None and pd.notna(zone_val) else None
            )

            if np.isnan(energy_val) and np.isnan(freq_val) and np.isnan(homogeneity):
                continue

            all_rows.append(
                {
                    "test": test_name,
                    "run": run_val,
                    "generator": generator,
                    "energy_per_pulse_mJ": energy_val,
                    "frequency_kHz": freq_val,
                    "gap_mm": gap_val,
                    "source_raw": str(raw_source) if pd.notna(raw_source) else None,
                    "homogeneity": homogeneity,
                    "discharge_homogeneity": discharge_hom,
                    "preferred_zone": str(preferred_zone) if preferred_zone else None,
                    "description": description,
                    "analysis_note": analysis_note,
                }
            )
            row_count += 1

        print(f"{row_count} rows")

    df_all = pd.DataFrame(all_rows)
    if "analysis_note" not in df_all.columns:
        df_all["analysis_note"] = ""
    else:
        df_all["analysis_note"] = df_all["analysis_note"].fillna("")

    _t344_rows = (df_all["Test"] == "T344").sum()
    _scored = df_all["homogeneity"].notna().sum()
    _lensing = (df_all["homogeneity"] == 0.0).sum()
    print(
        f"\nLab tests total: {len(df_all)} rows | scored: {_scored} | "
        f"lensing (homogeneity=0): {_lensing}"
    )
    print(f"  T344 rows: {_t344_rows} (position-only, no homogeneity score)")

    # ── Step 2: Load T346 pilot campaign log & T344 supplementary file ───────
    print("\nLoading T346 pilot campaign log…")
    _df_ep_raw = load_supplementary_xlsx("T346", label="T346-EP")

    ep_rows: list[dict] = []

    if not _df_ep_raw.empty:
        _EP_HOMOG_COL = "Plasma homogène"
        _EP_FLOW_COL = "Débit (SLM)"
        _EP_CH4_COL = "CH4 (%)"
        _EP_H2_COL = "H2 (%)"
        _EP_REACTOR_COL = "Réacteur"
        _EP_POINT_COL = "Point Test"
        _EP_PDC_COL = "PDC actifs"
        _EP_NOTES_COL = "Notes"
        _EP_DATE_COL = "Date"
        _EP_FREQ_COL = "Frequency (kHz)"
        _EP_VOLT_COL = "Voltage input (kV)"

        _ep_cols = list(_df_ep_raw.columns)

        for _, row in _df_ep_raw.iterrows():
            homog_raw = row.get(_EP_HOMOG_COL) if _EP_HOMOG_COL in _ep_cols else None
            score = parse_plasma_homogene(homog_raw)

            flow = (
                pd.to_numeric(row.get(_EP_FLOW_COL), errors="coerce")
                if _EP_FLOW_COL in _ep_cols
                else np.nan
            )
            ch4 = (
                pd.to_numeric(row.get(_EP_CH4_COL), errors="coerce")
                if _EP_CH4_COL in _ep_cols
                else np.nan
            )
            h2 = (
                pd.to_numeric(row.get(_EP_H2_COL), errors="coerce")
                if _EP_H2_COL in _ep_cols
                else np.nan
            )
            reactor = (
                str(row.get(_EP_REACTOR_COL, "")).strip()
                if _EP_REACTOR_COL in _ep_cols
                else ""
            )
            point = (
                str(row.get(_EP_POINT_COL, "")).strip()
                if _EP_POINT_COL in _ep_cols
                else ""
            )
            pdc = (
                str(row.get(_EP_PDC_COL, "")).strip() if _EP_PDC_COL in _ep_cols else ""
            )
            notes = (
                str(row.get(_EP_NOTES_COL, "")).strip()
                if _EP_NOTES_COL in _ep_cols
                else ""
            )
            date = (
                str(row.get(_EP_DATE_COL, "")).strip()
                if _EP_DATE_COL in _ep_cols
                else ""
            )

            freq_ep = (
                pd.to_numeric(row.get(_EP_FREQ_COL), errors="coerce")
                if _EP_FREQ_COL in _ep_cols
                else np.nan
            )
            volt_ep = (
                pd.to_numeric(row.get(_EP_VOLT_COL), errors="coerce")
                if _EP_VOLT_COL in _ep_cols
                else np.nan
            )
            energy_ep = (
                K_DEFAULT * float(volt_ep) ** 2
                if pd.notna(volt_ep) and float(volt_ep) > 0
                else np.nan
            )
            ep_analysis_note = (
                f"T346 pilot; E=k·V² (k={K_DEFAULT} mJ/kV², V={float(volt_ep):.1f} kV)"
                if pd.notna(volt_ep)
                else "T346 pilot"
            )

            lensing_in_notes = "lensing" in notes.lower()

            if pd.isna(flow) and pd.isna(ch4) and pd.isna(score):
                continue

            ep_rows.append(
                {
                    "date": date,
                    "point_test": point,
                    "reactor": reactor if reactor and reactor != "nan" else "—",
                    "flow_slm": float(flow) if pd.notna(flow) else np.nan,
                    "ch4_pct": float(ch4) if pd.notna(ch4) else np.nan,
                    "h2_pct": float(h2) if pd.notna(h2) else np.nan,
                    "plasma_homogene_raw": str(homog_raw)
                    if homog_raw is not None
                    else None,
                    "homogeneity": score,
                    "pdc_actifs": pdc if pdc and pdc != "nan" else "—",
                    "notes": notes if notes and notes != "nan" else None,
                    "lensing_in_notes": lensing_in_notes,
                    "energy_per_pulse_mJ": energy_ep,
                    "frequency_kHz": float(freq_ep) if pd.notna(freq_ep) else np.nan,
                    "analysis_note": ep_analysis_note,
                }
            )

    df_ep = pd.DataFrame(ep_rows)
    if not df_ep.empty:
        _ep_full = (df_ep["homogeneity"] == 1.0).sum()
        _ep_partial = (df_ep["homogeneity"] == 0.5).sum()
        _ep_none = (df_ep["homogeneity"] == 0.0).sum()
        _ep_lens_notes = df_ep["lensing_in_notes"].sum()
        print(
            f"  EP rows: {len(df_ep)} total | "
            f"homogène={_ep_full}, partiel={_ep_partial}, non-homogène={_ep_none} | "
            f"{_ep_lens_notes} notes mention lensing"
        )
    else:
        print("  T346-EP: no data loaded")

    # ── Step 2b: Merge T346_sup rows into df_all ──────────────────────────────
    if not df_ep.empty:
        ep_for_lab: list[dict] = []
        for _, row in df_ep.iterrows():
            e = row.get("energy_per_pulse_mJ")
            f = row.get("frequency_kHz")
            if pd.isna(e) and pd.isna(f):
                continue
            reactor = row.get("reactor", "—")
            flow = row.get("flow_slm")
            ch4 = row.get("ch4_pct")
            base_notes = row.get("notes") or ""
            ep_ctx = (
                f"Reactor: {reactor}"
                + (f" | Flow: {flow:.1f} SLM" if pd.notna(flow) else "")
                + (f" | CH4: {ch4:.1f}%" if pd.notna(ch4) else "")
                + (f" | {base_notes}" if base_notes else "")
            )
            ep_for_lab.append(
                {
                    "test": "T346_sup",
                    "run": row.get("point_test"),
                    "generator": "EP",
                    "energy_per_pulse_mJ": e,
                    "frequency_kHz": f,
                    "gap_mm": "—",
                    "source_raw": row.get("plasma_homogene_raw"),
                    "homogeneity": row.get("homogeneity"),
                    "discharge_homogeneity": row.get("plasma_homogene_raw"),
                    "preferred_zone": None,
                    "description": ep_ctx,
                    "analysis_note": row.get("analysis_note", "T346 pilot"),
                }
            )
        if ep_for_lab:
            df_all = pd.concat([df_all, pd.DataFrame(ep_for_lab)], ignore_index=True)
            print(f"  Merged {len(ep_for_lab)} T346_sup rows into main DataFrame")
        else:
            print("  T346-EP: energy/frequency missing — not merged into main scatter")

    # ── Step 2c: Load T344 supplementary file ────────────────────────────────
    # T344_analyse_EP.xlsx has a 'Lensing' column (Oui/Non/—) and 'Plasma homogène'.
    # T344 always ran at 12 kV / 40 kHz → E = K_DEFAULT * 12² ≈ 10.08 mJ.

    print("\nLoading T344 supplementary log…")
    _df_t344_ep = load_supplementary_xlsx("T344", label="T344-EP")

    if not _df_t344_ep.empty:
        _T344_LENSING_COL = "Lensing"
        _T344_HOMOG_COL = "Plasma homogène"
        _T344_RUN_COL = "Run N°"
        _T344_NOTES_COL = "Notes / Observations"
        _t344_ep_cols = list(_df_t344_ep.columns)

        # Fixed operating conditions for all T344 runs
        _T344_VOLT = 12.0
        _T344_FREQ = 40.0
        _T344_ENERGY = K_DEFAULT * _T344_VOLT**2

        t344_sup_rows: list[dict] = []
        for _, row in _df_t344_ep.iterrows():
            # Primary score: Plasma homogène (same logic as T346)
            homog_raw = (
                row.get(_T344_HOMOG_COL) if _T344_HOMOG_COL in _t344_ep_cols else None
            )
            homog_score = parse_plasma_homogene(homog_raw)

            # Secondary: Lensing column — if Lensing=Oui, override to 0.0
            lensing_raw = (
                str(row.get(_T344_LENSING_COL, "")).strip()
                if _T344_LENSING_COL in _t344_ep_cols
                else "—"
            )
            if lensing_raw.lower() in ("oui", "yes", "y"):
                homog_score = 0.0

            # If Plasma homogène gave no score, fall back to Lensing column
            if np.isnan(homog_score) and lensing_raw not in ("—", "", "nan"):
                homog_score = _parse_lensing_to_homogeneity(lensing_raw)

            if np.isnan(homog_score):
                continue

            # Build source label: prefer Plasma homogène, note Lensing if it overrode
            homog_str = str(homog_raw).strip() if homog_raw is not None else "—"
            if lensing_raw.lower() in ("oui", "yes", "y"):
                source_label = f"Lensing={lensing_raw} / Plasma={homog_str}"
            else:
                source_label = (
                    homog_str if homog_str not in ("—", "", "nan") else lensing_raw
                )

            run_label = (
                str(row.get(_T344_RUN_COL, "")).strip()
                if _T344_RUN_COL in _t344_ep_cols
                else ""
            )
            notes = (
                str(row.get(_T344_NOTES_COL, "")).strip()
                if _T344_NOTES_COL in _t344_ep_cols
                else ""
            )
            description = notes if notes and notes not in ("—", "nan", "") else None

            t344_sup_rows.append(
                {
                    "test": "T344_sup",
                    "run": run_label
                    if run_label and run_label not in ("nan", "")
                    else None,
                    "generator": "EP",
                    "energy_per_pulse_mJ": _T344_ENERGY,
                    "frequency_kHz": _T344_FREQ,
                    "gap_mm": "—",
                    "source_raw": source_label,
                    "homogeneity": homog_score,
                    "discharge_homogeneity": homog_str,
                    "preferred_zone": None,
                    "description": description,
                    "analysis_note": (
                        f"T344 pilot; E=k·V² (k={K_DEFAULT} mJ/kV², V={_T344_VOLT:.0f} kV, "
                        f"fixed)"
                    ),
                }
            )

        if t344_sup_rows:
            _t344s_lensing = sum(1 for r in t344_sup_rows if r["homogeneity"] == 0.0)
            _t344s_good = sum(1 for r in t344_sup_rows if r["homogeneity"] >= 0.75)
            df_all = pd.concat([df_all, pd.DataFrame(t344_sup_rows)], ignore_index=True)
            print(
                f"  Merged {len(t344_sup_rows)} T344_sup rows | "
                f"lensing={_t344s_lensing}, good homogeneity={_t344s_good}"
            )
        else:
            print("  T344-EP: no scorable rows")

    return df_all, df_ep

Step 3: Figure helpers delegated to sda.analysis.viz

Step 4: Build lab figure (energy × frequency)#

def build_lab_figure(
    df_all: pd.DataFrame,
    selected_tests: list[str],
    selected_generators: list[str],
    selected_gaps: list[str],
):
    """Build Plotly figure with heatmap + iso-power lines + scatter for lab tests.

    Parameters
    ----------
    df_all:
        DataFrame returned as the first element of :func:`build_dataframes`.
    selected_tests, selected_generators, selected_gaps:
        Active filter values (e.g. from Dash checklists).
    """
    import plotly.graph_objects as go

    sub = df_all[
        df_all["Test"].isin(selected_tests)
        & df_all["generator"].isin(selected_generators)
        & df_all["gap_mm"].isin(selected_gaps)
    ].copy()

    plot = sub.dropna(
        subset=["energy_per_pulse_mJ", "frequency_kHz", "homogeneity"]
    ).copy()

    traces = []

    base = df_all.dropna(subset=["energy_per_pulse_mJ", "frequency_kHz"])
    x_all = base["energy_per_pulse_mJ"].values
    y_all = base["frequency_kHz"].values
    if len(x_all) == 0:
        return go.Figure()

    x_min, x_max = float(x_all.min()), float(x_all.max())
    y_min, y_max = float(y_all.min()), float(y_all.max())

    # --- Layer 1: Background heatmap ---
    heatmap = background_heatmap_trace(
        x_pts=plot["energy_per_pulse_mJ"].values,
        y_pts=plot["frequency_kHz"].values,
        z_pts=plot["homogeneity"].values,
        x_range=(x_min, x_max),
        y_range=(y_min, y_max),
        colorscale="RdYlGn",
        zmin=0,
        zmax=1,
    )
    if heatmap is not None:
        traces.append(heatmap)

    # --- Layer 2: Iso-power lines ---
    power_line_traces, power_annotations = iso_power_traces(x_min, x_max, y_min, y_max)
    traces.extend(power_line_traces)

    # --- Layer 3: Scatter ---
    def _fmt(val, unit="", fallback="—", decimals=2):
        if val is None or (isinstance(val, float) and np.isnan(val)):
            return fallback
        try:
            return f"{float(val):.{decimals}f} {unit}".strip()
        except (TypeError, ValueError):
            return f"{val} {unit}".strip()

    def _build_hover(row):
        e = row["energy_per_pulse_mJ"]
        f = row["frequency_kHz"]
        power_str = f"{e * f:.1f} W" if (pd.notna(e) and pd.notna(f)) else "—"
        h_score = row.get("homogeneity")
        h_str = f"{h_score:.2f}" if pd.notna(h_score) else "—"
        lines = [
            f"<b>Test:</b> {row['test']}",
            f"<b>Run:</b> {row['run'] if row['run'] is not None else '—'}",
            f"<b>Generator:</b> {row['generator']}",
            f"<b>Energy:</b> {_fmt(e, 'mJ')}",
            f"<b>Frequency:</b> {_fmt(f, 'kHz')}",
            f"<b>Gap:</b> {row['gap_mm'] if row['gap_mm'] not in (None, '—') else '—'} mm",
            f"<b>Power:</b> {power_str}",
            f"<b>Homogeneity score:</b> {h_str}",
            f"<b>Source:</b> {row.get('source_raw') or '—'}",
        ]
        if row.get("discharge_homogeneity"):
            lines.append(
                f"<b>Discharge homogeneity:</b> {row['discharge_homogeneity']}"
            )
        if row.get("preferred_zone"):
            lines.append(f"<b>Preferred zone:</b> {row['preferred_zone']}")
        if row.get("description") and not pd.isna(row["description"]):
            lines.append(f"<b>Notes:</b> {row['description']}")
        if row.get("analysis_note"):
            lines.append(f"<b>Note:</b> {row['analysis_note']}")
        return "<br>".join(lines)

    # Rows with a valid homogeneity score (colored)
    plot_scored = plot.copy()
    n = len(plot_scored)
    if n > 0:
        traces.append(
            go.Scatter(
                x=plot_scored["energy_per_pulse_mJ"],
                y=plot_scored["frequency_kHz"],
                mode="markers",
                marker=dict(
                    size=13,
                    symbol=[
                        _GAP_SYMBOL.get(str(g), _GAP_SYMBOL_DEFAULT)
                        for g in plot_scored["gap_mm"]
                    ],
                    color=plot_scored["homogeneity"],
                    colorscale="RdYlGn",
                    cmin=0,
                    cmax=1,
                    colorbar=dict(
                        title="Homogeneity<br>(0=lensing, 1=homog.)",
                        tickvals=[0, 0.5, 0.75, 1],
                        ticktext=[
                            "0 — lensing",
                            "0.5 — partial",
                            "0.75 — no lensing",
                            "1 — homogeneous",
                        ],
                        thickness=18,
                    ),
                    line=dict(width=0.8, color="black"),
                ),
                text=[_build_hover(r) for _, r in plot_scored.iterrows()],
                hovertemplate="%{text}<extra></extra>",
                showlegend=False,
                name="homogeneity scored",
            )
        )

    # Rows with energy+freq but NO homogeneity score (position-only — gray hollow)
    plot_unscored = sub.dropna(subset=["energy_per_pulse_mJ", "frequency_kHz"]).copy()
    plot_unscored = plot_unscored[plot_unscored["homogeneity"].isna()]
    if len(plot_unscored) > 0:
        traces.append(
            go.Scatter(
                x=plot_unscored["energy_per_pulse_mJ"],
                y=plot_unscored["frequency_kHz"],
                mode="markers",
                marker=dict(
                    size=13,
                    symbol=[
                        _GAP_SYMBOL.get(str(g), _GAP_SYMBOL_DEFAULT)
                        for g in plot_unscored["gap_mm"]
                    ],
                    color="rgba(0,0,0,0)",
                    line=dict(width=1.5, color="rgba(120,120,120,0.8)"),
                ),
                text=[_build_hover(r) for _, r in plot_unscored.iterrows()],
                hovertemplate="%{text}<extra></extra>",
                showlegend=True,
                name="no homogeneity data",
            )
        )

    n_unscored = len(plot_unscored)
    x_pad_ax = (x_max - x_min) * 0.10 or 1
    y_pad_ax = (y_max - y_min) * 0.10 or 0.5
    fig = go.Figure(traces)
    fig.update_layout(
        title=(
            f"Discharge Homogeneity — Lab tests + T346 pilot (T346_sup)"
            f" ({n} scored + {n_unscored} position-only)"
            "<br><sup>Color: red=lensing/non-homog., green=homogeneous | ○ hollow = no data"
            " | Shape: ● gap=3 mm, ◆ gap=2 mm, ■ gap=5 mm, ▲ T346_sup (no gap)"
            " | Diagonals = constant power (P = E × f)</sup>"
        ),
        xaxis=dict(
            title="Energy per pulse (mJ)",
            zeroline=False,
            range=[x_min - x_pad_ax, x_max + x_pad_ax],
        ),
        yaxis=dict(
            title="Frequency (kHz)",
            zeroline=False,
            range=[y_min - y_pad_ax, y_max + y_pad_ax],
        ),
        hoverlabel=dict(bgcolor="white", font_size=12),
        plot_bgcolor="white",
        height=520,
        margin=dict(l=60, r=20, t=80, b=50),
        legend=dict(x=0.01, y=0.99, bgcolor="rgba(255,255,255,0.8)", borderwidth=1),
        annotations=power_annotations,
    )
    return fig

Step 5: Build EP pilot figure (flow rate × CH4%)#

def build_ep_figure(
    df_ep: pd.DataFrame,
    selected_reactors: list[str],
    selected_points: list[str],
):
    """Build Plotly scatter for T346 pilot campaign: flow × CH4%, colored by homogeneity.

    Parameters
    ----------
    df_ep:
        DataFrame returned as the second element of :func:`build_dataframes`.
    selected_reactors, selected_points:
        Active filter values (e.g. from Dash checklists).
    """
    import plotly.graph_objects as go

    if df_ep.empty:
        return go.Figure().update_layout(
            title="T346 Pilot campaign — no data loaded",
            height=420,
        )

    sub = df_ep[
        df_ep["reactor"].isin(selected_reactors)
        & df_ep["point_test"].isin(selected_points)
    ].copy()
    plot = sub.dropna(subset=["flow_slm", "ch4_pct", "homogeneity"]).copy()

    n = len(plot)
    hover_texts = []
    for _, row in plot.iterrows():
        lines = [
            f"<b>Date:</b> {row['date']}",
            f"<b>Point:</b> {row['point_test']}",
            f"<b>Reactor:</b> {row['reactor']}",
            f"<b>Flow:</b> {row['flow_slm']:.1f} SLM",
            f"<b>CH4:</b> {row['ch4_pct']:.1f}%",
            (
                f"<b>H2:</b> {row['h2_pct']:.1f}%"
                if pd.notna(row["h2_pct"])
                else "<b>H2:</b> —"
            ),
            f"<b>Plasma homogène:</b> {row['plasma_homogene_raw'] or '—'}",
            f"<b>PDC actifs:</b> {row['pdc_actifs']}",
        ]
        if row["notes"]:
            lines.append(f"<b>Notes:</b> {row['notes']}")
        hover_texts.append("<br>".join(lines))

    symbols = [_REACTOR_SYMBOL.get(r, _REACTOR_SYMBOL_DEFAULT) for r in plot["reactor"]]
    lensing_noted = plot[plot["lensing_in_notes"]].copy()

    traces = [
        go.Scatter(
            x=plot["flow_slm"],
            y=plot["ch4_pct"],
            mode="markers",
            marker=dict(
                size=13,
                symbol=symbols,
                color=plot["homogeneity"],
                colorscale="RdYlGn",
                cmin=0,
                cmax=1,
                colorbar=dict(
                    title="Homogeneity<br>(0=lensing, 1=homog.)",
                    tickvals=[0, 0.5, 1],
                    ticktext=["0 — Non", "0.5 — Partiel", "1 — Oui"],
                    thickness=18,
                ),
                line=dict(width=0.8, color="black"),
            ),
            text=hover_texts,
            hovertemplate="%{text}<extra></extra>",
            name="measurements",
            showlegend=False,
        )
    ]

    if len(lensing_noted) > 0:
        traces.append(
            go.Scatter(
                x=lensing_noted["flow_slm"],
                y=lensing_noted["ch4_pct"],
                mode="markers",
                marker=dict(
                    size=18,
                    symbol="star",
                    color="rgba(0,0,0,0)",
                    line=dict(width=2, color="darkred"),
                ),
                hoverinfo="skip",
                showlegend=True,
                name="★ lensing in notes",
            )
        )

    fig = go.Figure(traces)
    fig.update_layout(
        title=(
            f"T346 Pilot campaign — Plasma homogeneity ({n} points)"
            "<br><sup>Color: red=non-homogeneous (lensing), green=homogeneous"
            " | Shape: ●R1 ■R2 ◆R4 ▲R5 | ★ = lensing explicitly noted</sup>"
        ),
        xaxis=dict(title="Flow rate (SLM)", zeroline=False),
        yaxis=dict(title="CH4 (%)", zeroline=False),
        hoverlabel=dict(bgcolor="white", font_size=12),
        plot_bgcolor="white",
        height=420,
        margin=dict(l=60, r=20, t=80, b=50),
        legend=dict(x=0.01, y=0.99, bgcolor="rgba(255,255,255,0.8)", borderwidth=1),
    )
    return fig

Standalone entry-point — static Plotly charts (no Dash)#

if __name__ == "__main__":
    _df_all, _df_ep = build_dataframes()
    _all_tests = sorted(_df_all["Test"].unique().tolist())
    _all_gens = sorted(_df_all["generator"].unique().tolist())
    _gaps_num = sorted(
        [g for g in _df_all["gap_mm"].dropna().unique().tolist() if g != "—"],
        key=lambda x: float(x),
    )
    _gaps_ep = ["—"] if "—" in _df_all["gap_mm"].values else []
    _all_gaps = _gaps_num + _gaps_ep

    print(f"\nOpening lab figure ({len(_all_tests)} tests)…")
    build_lab_figure(_df_all, _all_tests, _all_gens, _all_gaps).show()

    if not _df_ep.empty:
        _all_reactors = sorted(_df_ep["reactor"].dropna().unique().tolist())
        _all_points = sorted(_df_ep["point_test"].dropna().unique().tolist())
        print("Opening EP pilot figure…")
        build_ep_figure(_df_ep, _all_reactors, _all_points).show()