.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/load_all_test_data.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        You can download :ref:`below <sphx_glr_download_auto_examples_load_all_test_data.py>`
        the full example code and run it online in `Codespaces <https://codespaces.new/spark-cleantech-l3/sda-copy?quickstart=1>`__

        .. image:: https://github.com/codespaces/badge.svg
            :target: https://codespaces.new/spark-cleantech-l3/sda-copy?quickstart=1

---

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_load_all_test_data.py:


Load all test data.
===================

This example shows how to load multiple test data files using the library's
centralized functions for batch operations : :py:func:`~sda.api.load.list_all_files`
and :py:func:`~sda.api.load.parse_files`.

.. GENERATED FROM PYTHON SOURCE LINES 11-13

First, we import the required libraries.
-----------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 13-25

.. code-block:: Python

    import warnings
    from pathlib import Path

    import matplotlib.pyplot as plt
    import pandas as pd

    from sda.api.load import list_all_files, parse_files

    # Suppress warnings for cleaner output.
    # See https://stackoverflow.com/questions/53965596/python-3-openpyxl-userwarning-data-validation-extension-not-supported
    warnings.simplefilter(action="ignore", category=UserWarning)


.. GENERATED FROM PYTHON SOURCE LINES 26-28

Discover all available test files.
-----------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 28-34

.. code-block:: Python


    # Use the library function to discover all available files
    file_paths = list_all_files(filter="*.xls*")
    print(f"Found {len(file_paths)} files. Here are the paths:")
    for i, path in enumerate(file_paths):
        print(f"{i}: {path}")

.. GENERATED FROM PYTHON SOURCE LINES 35-37

Load test data using the universal parse_files() function.
-----------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 37-72

.. code-block:: Python


    columns_to_keep = ["plasma_power", "CH4_conversion"]

    print(f"Found {len(file_paths)} files to parse")

    # Ignore some :
    file_paths = [
        path
        for path in file_paths
        if Path(path).name
        not in [
            "XP_001_Explication_par_jour.xlsx",
            "T110_DATA_validation_torch_V7C1.xlsx",
            # "T111_Generateur_pilote_reception.xlsx",
            # "T116_DATA_validation_torch_V6I.xlsx",
            # "T126_DATA.xlsx",
            "T127_validation_CA1B.xlsx",
            # "T132_Generateur_pilote_reception_avec_template_SOLO3.xlsx",
            # "T132_Generateur_pilote_reception_avec_template_SOLO4.xlsx",
            "T162.xlsx",
            "T097B_test_data_wrong_table_name.xlsx",  # Test file with invalid table names
        ]
    ]

    df = parse_files(
        file_paths,
        command={},  # Legacy parameters - now uses automatic Excel table detection
        columns_to_keep=columns_to_keep,
        verbose=2,
        column_not_found="warn",
        table_not_found="warn",  # don't raise an error if no Data Table is found in a file
    )
    # reverse the order of filepaths:
    file_paths.reverse()


.. GENERATED FROM PYTHON SOURCE LINES 73-75

Add generator type for analysis.
--------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 75-95

.. code-block:: Python


    # Add a new column for the generator type.
    # By default, we assume the generator type is "NRP".
    df["Generator Type"] = "NRP"
    # For some tests (depending on the file), it is "DC".
    DC_tests = [
        "T157",
        "T173",
        "T196",
        "T197",
        "T234",
        "T268",
        "T281",
    ]
    # Set the generator type to "DC" for the specified tests.
    for test in DC_tests:
        mask = df["file"].str.contains(test, case=False, na=False)
        if mask.any():
            df.loc[mask, "Generator Type"] = "DC"


.. GENERATED FROM PYTHON SOURCE LINES 96-98

Filter and prepare data for plotting.
--------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 98-126

.. code-block:: Python


    # Print the number of points before filtering.
    print(f"Number of points before filtering: {len(df)}")

    # Extract the relevant columns for plotting.
    plasma_power = df["plasma_power value"]
    ch4_conversion = df["CH4_conversion value"]
    generator_type = df["Generator Type"]

    # Convert to numeric, errors='coerce' will convert non-numeric values to NaN.
    plasma_power = pd.to_numeric(plasma_power, errors="coerce")
    ch4_conversion = pd.to_numeric(ch4_conversion, errors="coerce")

    # Remove points at plasma power = 0 W, or CH4 conversion = 0% or 100%.
    mask = (plasma_power > 0) & (ch4_conversion > 0) & (ch4_conversion < 1)
    plasma_power = plasma_power[mask]
    ch4_conversion = ch4_conversion[mask]
    generator_type = generator_type[mask]

    # Remove NaN values.
    mask = plasma_power.notna() & ch4_conversion.notna() & generator_type.notna()
    plasma_power = plasma_power[mask]
    ch4_conversion = ch4_conversion[mask]
    generator_type = generator_type[mask]

    # Print the number of points after filtering.
    print(f"Number of points after filtering: {len(plasma_power)}")


.. GENERATED FROM PYTHON SOURCE LINES 127-129

Plot the results.
-----------------

.. GENERATED FROM PYTHON SOURCE LINES 129-152

.. code-block:: Python


    fig, ax = plt.subplots(figsize=(10, 6))

    # Plot the data.
    # Change the color based on the generator type.
    colors = {"NRP": "blue", "DC": "orange"}
    for gen_type in generator_type.unique():
        mask = generator_type == gen_type
        ax.scatter(
            plasma_power[mask],
            ch4_conversion[mask] * 100,  # Convert to percentage
            label=gen_type,
            color=colors.get(gen_type, "gray"),
            alpha=0.5,
        )
    ax.set_xlabel("Plasma Power (W)")
    ax.set_ylabel("$CH_4$ Conversion (%)")
    ax.set_title("Plasma Power vs $CH_4$ Conversion")
    ax.legend(title="Generator Type")
    ax.set_ylim(0, 100)  # Set y-axis limits to [0, 100] for percentage

    plt.show()


.. _sphx_glr_download_auto_examples_load_all_test_data.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: load_all_test_data.ipynb <load_all_test_data.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: load_all_test_data.py <load_all_test_data.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: load_all_test_data.zip <load_all_test_data.zip>`