Spark Data Access/Analysis (SDA)#

Open all files from Spark’s research and industrial environments
Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

Contents:

Features & Specifications#

SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech’s research and industrial data:

Three-Pillar Architecture#

1. 📥 Retrieve Data#

Multiple Data Sources: SQL databases, Excel files on SharePoint, local file systems
Flexible APIs: High-level functions like :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests for batch operations
Auto-Discovery: Intelligent test file discovery with pattern matching and wildcards
Custom Mappings: Support for non-standard test files through configuration

2. 🔓 Open/Decode Data#

Format Support: Excel (.xlsx, .xls), TRC files, various proprietary formats
Decoding Utilities: Specialized readers like :py:class:sda.io.read_trc.ReadTrc for different file types
Data Validation: Built-in validation and error handling for data integrity
Template Compliance: STT template tracking and compliance metrics

3. 🔍 Explore/Visualize Data#

Interactive Dashboard: Web-based visualization with scatter plots, line plots, and filtering
Python APIs: Programmatic data analysis and visualization capabilities
Command-Line Interface: Terminal-based data operations and exploration tools
Export & Reproducibility: Generate Python scripts from interactive analysis workflows

Key Components#

Dashboard Component#

Web-based interface performing all three pillars in a unified experience:

Real-time visualization and dynamic filtering
Multi-test dataset support with intelligent column management
Dual processing: simultaneous analysis and Python script generation

Python APIs#

Programmatic access for custom analysis workflows:

Data loading: :py:func:~sda.api.load.load_test, :py:func:~sda.api.load.parse_files
File decoding: :py:class:~sda.io.read_trc.ReadTrc, format-specific readers
Analysis utilities: data quality tools and performance metrics

Command-Line Interface#

Terminal-based operations for efficient data workflows:

sda list - discover and filter available tests
sda print - terminal data exploration with intelligent formatting
sda dash - launch dashboard with specific tests
sda excel/open/plume - file and web operations

Technical Highlights#

Cross-Platform Support: Windows, macOS, Linux compatibility
Performance Optimization: Flask-Caching integration, efficient data processing
Flexible Configuration: Custom test mappings, environment variables
Quality Assurance: Comprehensive testing with unit tests and E2E validation

For complete technical specifications, see the SDA Specification Document.

Use Cases#

1. Retrieve Data#

In your analysis code, you can use retrieval functions to access data from Tests, such as :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests:

from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data

from sda.api.load import load_test
df = load_test('T183') # 2024+ data

2. Read Data#

In your analysis code, you can use decoding classes such as :py:class:~sda.io.read_trc.ReadTrc:

from sda.io import read_trc

trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")

3. Explore Data#

Launch the interactive dashboard from the console:

sda dash

Or use the dashboard from Python:

from sda.dashboard import run_dashboard

# Launch dashboard for a specific test
run_dashboard(test_name="T183")

The dashboard provides:

Interactive scatter plots with filtering and color mapping
Line plots for trend analysis
Dynamic filtering by test parameters

4. Performance Analysis#

The SDA library provides comprehensive performance analysis for test data:

Analyze all available test data on your machine
Track STT template compliance and parsing success rates
Visual quality thermometer with progress bars
Performance metrics (parsing speed, file sizes)
Enhanced Excel reports with native tables

Usage:

python examples/database_report_analysis.py

Command Line Interface#

SDA provides a convenient command-line interface for common operations. After installation, you can use the sda command directly:

Usage#

sda <command> [arguments...]

Available Commands#

List Tests#

# List all available tests on the machine
sda list

# Filter tests by pattern (supports wildcards)
sda list --filter "T1*"         # Tests starting with T1
sda list --filter "*183*"       # Tests containing 183

# Verbose output for detailed information
sda list -v
sda list -vv                     # Even more verbose

Print DataFrame#

Explore and display test data directly in the terminal:

# Basic usage - uses pandas native display (recommended)
sda print T297

# Show ALL rows using to_string() method
sda print T297 --all

# Select specific columns for better readability
sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)"

# Transpose view - excellent for wide datasets
sda print T297 --transpose

Dashboard Commands#

# Start dashboard with no test preloaded
sda dash

# Start dashboard with a specific test
sda dash T183

# Start dashboard with multiple tests
sda dash T183 T196

# Start dashboard in debug mode (auto-reload)
sda dash --debug

File Operations#

# Open test folder in file explorer
sda open T183

# Open Excel data file with default application
sda excel T183

Web Integration#

# Open Plume webpage for the test
sda plume T183

For comprehensive CLI documentation with all commands and examples, see the Examples section.

For installation instructions, see the Installation page.

Direct Excel files and custom mappings (`CUSTOM_FILES`)#

Reading a workbook by filesystem path#

Pass a path to an Excel file (.xlsx, .xlsm, …) wherever you load data or open the CLI:

from sda.api.load import load_test

df = load_test("path/to/workbook.xlsx")

sda print path/to/workbook.xlsx
sda dash path/to/workbook.xlsx

SDA selects the appropriate parser (named tables, sample-based layout, etc.) from the file contents.

Short names and cloud: `CUSTOM_FILES`#

Non-standard layouts or long paths can be mapped in ~/sda.json under CUSTOM_FILES (legacy CUSTOM_TESTS is still accepted): keys are aliases or stems; values are folders or files.

{
  "CUSTOM_FILES": {
    "R226": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226",
    "CAR008": "~/Spark Cleantech/6. Carbone L2 - Documents/3. Spark Carbons Analysis/CAR008-Suivi analyses _WIP.xlsx",
    "CAR012": "~/Spark Cleantech/6. Carbone L2 - Documents/1. Carbon Knowledge/4. Colloidal Properties/CAR012-Carbon Colloidal Properties_Database_WIP.xlsx"
  }
}

Notes:

Keys can be the exact filename, its stem, or an alias (e.g. CAR008, CAR012). Matching is case-insensitive.
Values can be a folder path (preferred) or a direct file path; cloud downloads use Graph when configured.
Override without editing the file: SDA_CUSTOM_FILES (JSON), or legacy SDA_CUSTOM_TESTS.

set SDA_CUSTOM_FILES={"R226":"~/Spark Cleantech/.../Requests/R226"}

Example: CAR008 (sample-based carbon summaries) and CAR012 (colloidal properties reference)#

CAR008 — multi-sheet workbook (one sheet per diagnostic, one row per sample); --groupby Sample is typical in the dashboard. Column headers match the sheets unless the same header text appears on multiple sheets—in that case SDA prefixes with the sheet name (e.g. HAP_Average (%) and MOISTURE_Average (%)).

CAR012 — reference database workbook (flat multi-column layout); load without --groupby.

from sda.api.load import load_test

df = load_test("tests/data/CAR008-Sample.xlsx")  # direct Excel path (e.g. repo / CI fixture)
df = load_test("CAR008")  # alias via CUSTOM_FILES
df = load_test("CAR012")  # alias via CUSTOM_FILES

sda print tests/data/CAR008-Sample.xlsx
sda dash CAR008 --groupby Sample
sda dash CAR008 --groupby Sample --inc-cloud
sda dash CAR012
sda dash CAR012 --inc-cloud

Other custom IDs (e.g. R226):

CLI: sda excel R226, sda open R226, sda dash R226

from sda.api.load import load_test

df = load_test("R226.xlsm")

License#

Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential