Spark Data Access/Analysis (SDA)#

Open all files from Spark’s research and industrial environments
Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

Contents:

Features & Specifications#

SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech’s research and industrial data:

Three-Pillar Architecture#

1. 📥 Retrieve Data#

Multiple Data Sources: SQL databases, Excel files on SharePoint, local file systems
Flexible APIs: High-level functions like :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests for batch operations
Auto-Discovery: Intelligent test file discovery with pattern matching and wildcards
Custom Mappings: Support for non-standard test files through configuration

2. 🔓 Open/Decode Data#

Format Support: Excel (.xlsx, .xls), TRC files, various proprietary formats
Decoding Utilities: Specialized readers like :py:class:sda.io.read_trc.ReadTrc for different file types
Data Validation: Built-in validation and error handling for data integrity
Template Compliance: STT template tracking and compliance metrics

3. 🔍 Explore/Visualize Data#

Interactive Dashboard: Web-based visualization with scatter plots, line plots, and filtering
Python APIs: Programmatic data analysis and visualization capabilities
Command-Line Interface: Terminal-based data operations and exploration tools
Export & Reproducibility: Generate Python scripts from interactive analysis workflows

Key Components#

Dashboard Component#

Web-based interface performing all three pillars in a unified experience:

Real-time visualization and dynamic filtering
Multi-test dataset support with intelligent column management
Dual processing: simultaneous analysis and Python script generation

Python APIs#

Programmatic access for custom analysis workflows:

Data loading: :py:func:~sda.api.load.load_test, :py:func:~sda.api.load.parse_files
File decoding: :py:class:~sda.io.read_trc.ReadTrc, format-specific readers
Analysis utilities: data quality tools and performance metrics

Command-Line Interface#

Terminal-based operations for efficient data workflows:

sda list - discover and filter available tests
sda print - terminal data exploration with intelligent formatting
sda dash - launch dashboard with specific tests
sda excel/open/plume - file and web operations

Technical Highlights#

Cross-Platform Support: Windows, macOS, Linux compatibility
Performance Optimization: Flask-Caching integration, efficient data processing
Flexible Configuration: Custom test mappings, environment variables
Quality Assurance: Comprehensive testing with unit tests and E2E validation

For complete technical specifications, see the SDA Specification Document.

Use Cases#

1. Retrieve Data#

In your analysis code, you can use retrieval functions to access data from Tests, such as :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests:

from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data

from sda.api.load import load_test
df = load_test('T183') # 2024+ data

2. Read Data#

In your analysis code, you can use decoding classes such as :py:class:~sda.io.read_trc.ReadTrc:

from sda.io import read_trc

trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")

3. Explore Data#

Launch the interactive dashboard from the console:

sda dash

Or use the dashboard from Python:

from sda.dashboard import run_dashboard

# Launch dashboard for a specific test
run_dashboard(test_name="T183")

The dashboard provides:

Interactive scatter plots with filtering and color mapping
Line plots for trend analysis
Dynamic filtering by test parameters

4. Performance Analysis#

The SDA library provides comprehensive performance analysis for test data:

Analyze all available test data on your machine
Track STT template compliance and parsing success rates
Visual quality thermometer with progress bars
Performance metrics (parsing speed, file sizes)
Enhanced Excel reports with native tables

Usage:

python examples/database_report_analysis.py

Command Line Interface#

SDA provides a convenient command-line interface for common operations. After installation, you can use the sda command directly:

Usage#

sda <command> [arguments...]

Available Commands#

List Tests#

# List all available tests on the machine
sda list

# Filter tests by pattern (supports wildcards)
sda list --filter "T1*"         # Tests starting with T1
sda list --filter "*183*"       # Tests containing 183

# Verbose output for detailed information
sda list -v
sda list -vv                     # Even more verbose

Print DataFrame#

Explore and display test data directly in the terminal:

# Basic usage - uses pandas native display (recommended)
sda print T297

# Show ALL rows using to_string() method
sda print T297 --all

# Select specific columns for better readability
sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)"

# Transpose view - excellent for wide datasets
sda print T297 --transpose

Dashboard Commands#

# Start dashboard with no test preloaded
sda dash

# Start dashboard with a specific test
sda dash T183

# Start dashboard with multiple tests
sda dash T183 T196

# Start dashboard in debug mode (auto-reload)
sda dash --debug

File Operations#

# Open test folder in file explorer
sda open T183

# Open Excel data file with default application
sda excel T183

Web Integration#

# Open Plume webpage for the test
sda plume T183

For comprehensive CLI documentation with all commands and examples, see the Examples section.

For installation instructions, see the Installation page.

Custom Tests#

SDA can load non-standard test files (e.g., not following Txxx naming) using a custom mapping declared in your ~/sda.json under the CUSTOM_TESTS key:

{
  "CUSTOM_TESTS": {
    "R226.xlm": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"
  }
}

Notes:

Keys can be either the exact filename (e.g., R226.xlsm) or its stem (R226). Matching is case-insensitive, and stems allow requests like R226.xlsm to resolve from a R226.xlm key.
Values can be a folder path (preferred) or a direct file path. If a file path is given, SDA uses its parent directory as the data folder.
You can temporarily override or inject custom mappings without editing the file using the SDA_CUSTOM_TESTS environment variable (JSON mapping):

set SDA_CUSTOM_TESTS={"R226.xlm":"~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"}

Once configured, you can open or load your custom test via CLI or Python:

CLI
- Open Excel: sda excel R226
- Open folder: sda open R226
- Dashboard: sda dash R226
Python

from sda.api.load import load_test
df = load_test("R226.xlsm")

License#

Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential