Spark Data Access/Analysis (SDA)#
Open all files from Spark’s research and industrial environments
Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

Contents:
Features & Specifications#
SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech’s research and industrial data:
Three-Pillar Architecture#
1. 📥 Retrieve Data#
Multiple Data Sources: SQL databases, Excel files on SharePoint, local file systems
Flexible APIs: High-level functions like :py:func:
~sda.api.load.load_testand :py:func:~sda.api.load.load_testsfor batch operationsAuto-Discovery: Intelligent test file discovery with pattern matching and wildcards
Custom Mappings: Support for non-standard test files through configuration
2. 🔓 Open/Decode Data#
Format Support: Excel (.xlsx, .xls), TRC files, various proprietary formats
Decoding Utilities: Specialized readers like :py:class:
sda.io.read_trc.ReadTrcfor different file typesData Validation: Built-in validation and error handling for data integrity
Template Compliance: STT template tracking and compliance metrics
3. 🔍 Explore/Visualize Data#
Interactive Dashboard: Web-based visualization with scatter plots, line plots, and filtering
Python APIs: Programmatic data analysis and visualization capabilities
Command-Line Interface: Terminal-based data operations and exploration tools
Export & Reproducibility: Generate Python scripts from interactive analysis workflows
Key Components#
Dashboard Component#
Web-based interface performing all three pillars in a unified experience:
Real-time visualization and dynamic filtering
Multi-test dataset support with intelligent column management
Dual processing: simultaneous analysis and Python script generation
Python APIs#
Programmatic access for custom analysis workflows:
Data loading: :py:func:
~sda.api.load.load_test, :py:func:~sda.api.load.parse_filesFile decoding: :py:class:
~sda.io.read_trc.ReadTrc, format-specific readersAnalysis utilities: data quality tools and performance metrics
Command-Line Interface#
Terminal-based operations for efficient data workflows:
sda list- discover and filter available testssda print- terminal data exploration with intelligent formattingsda dash- launch dashboard with specific testssda excel/open/plume- file and web operations
Technical Highlights#
Cross-Platform Support: Windows, macOS, Linux compatibility
Performance Optimization: Flask-Caching integration, efficient data processing
Flexible Configuration: Custom test mappings, environment variables
Quality Assurance: Comprehensive testing with unit tests and E2E validation
For complete technical specifications, see the SDA Specification Document.
Use Cases#
1. Retrieve Data#
In your analysis code, you can use retrieval functions to access data from Tests,
such as :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests:
from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data
from sda.api.load import load_test
df = load_test('T183') # 2024+ data
2. Read Data#
In your analysis code, you can use decoding classes such as :py:class:~sda.io.read_trc.ReadTrc:
from sda.io import read_trc
trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")
3. Explore Data#
Launch the interactive dashboard from the console:
sda dash
Or use the dashboard from Python:
from sda.dashboard import run_dashboard
# Launch dashboard for a specific test
run_dashboard(test_name="T183")
The dashboard provides:
Interactive scatter plots with filtering and color mapping
Line plots for trend analysis
Dynamic filtering by test parameters
4. Performance Analysis#
The SDA library provides comprehensive performance analysis for test data:
Analyze all available test data on your machine
Track STT template compliance and parsing success rates
Visual quality thermometer with progress bars
Performance metrics (parsing speed, file sizes)
Enhanced Excel reports with native tables
Usage:
python examples/database_report_analysis.py
Command Line Interface#
SDA provides a convenient command-line interface for common operations. After installation, you can use the sda command directly:
Usage#
sda <command> [arguments...]
Available Commands#
List Tests#
# List all available tests on the machine
sda list
# Filter tests by pattern (supports wildcards)
sda list --filter "T1*" # Tests starting with T1
sda list --filter "*183*" # Tests containing 183
# Verbose output for detailed information
sda list -v
sda list -vv # Even more verbose
Print DataFrame#
Explore and display test data directly in the terminal:
# Basic usage - uses pandas native display (recommended)
sda print T297
# Show ALL rows using to_string() method
sda print T297 --all
# Select specific columns for better readability
sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)"
# Transpose view - excellent for wide datasets
sda print T297 --transpose
Dashboard Commands#
# Start dashboard with no test preloaded
sda dash
# Start dashboard with a specific test
sda dash T183
# Start dashboard with multiple tests
sda dash T183 T196
# Start dashboard in debug mode (auto-reload)
sda dash --debug
File Operations#
# Open test folder in file explorer
sda open T183
# Open Excel data file with default application
sda excel T183
Web Integration#
# Open Plume webpage for the test
sda plume T183
For comprehensive CLI documentation with all commands and examples, see the Examples section.
For installation instructions, see the Installation page.
Custom Tests#
SDA can load non-standard test files (e.g., not following Txxx naming) using a custom mapping declared
in your ~/sda.json under the CUSTOM_TESTS key:
{
"CUSTOM_TESTS": {
"R226.xlm": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"
}
}
Notes:
Keys can be either the exact filename (e.g.,
R226.xlsm) or its stem (R226). Matching is case-insensitive, and stems allow requests likeR226.xlsmto resolve from aR226.xlmkey.Values can be a folder path (preferred) or a direct file path. If a file path is given, SDA uses its parent directory as the data folder.
You can temporarily override or inject custom mappings without editing the file using the
SDA_CUSTOM_TESTSenvironment variable (JSON mapping):
set SDA_CUSTOM_TESTS={"R226.xlm":"~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"}
Once configured, you can open or load your custom test via CLI or Python:
CLI
Open Excel:
sda excel R226Open folder:
sda open R226Dashboard:
sda dash R226
Python
from sda.api.load import load_test
df = load_test("R226.xlsm")
License#
Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential