Spark Data Access/Analysis (SDA)#
Open all files from Spark’s research and industrial environments
Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

Contents:
Features & Specifications#
SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech’s research and industrial data:
Three-Pillar Architecture#
1. 📥 Retrieve Data#
Multiple Data Sources: SQL databases, Excel files on SharePoint, local file systems
Flexible APIs: High-level functions like :py:func:
~sda.api.load.load_testand :py:func:~sda.api.load.load_testsfor batch operationsAuto-Discovery: Intelligent test file discovery with pattern matching and wildcards
Custom Mappings: Support for non-standard test files through configuration
2. 🔓 Open/Decode Data#
Format Support: Excel (.xlsx, .xls), TRC files, various proprietary formats
Decoding Utilities: Specialized readers like :py:class:
sda.io.read_trc.ReadTrcfor different file typesData Validation: Built-in validation and error handling for data integrity
Template Compliance: STT template tracking and compliance metrics
3. 🔍 Explore/Visualize Data#
Interactive Dashboard: Web-based visualization with scatter plots, line plots, and filtering
Python APIs: Programmatic data analysis and visualization capabilities
Command-Line Interface: Terminal-based data operations and exploration tools
Export & Reproducibility: Generate Python scripts from interactive analysis workflows
Key Components#
Dashboard Component#
Web-based interface performing all three pillars in a unified experience:
Real-time visualization and dynamic filtering
Multi-test dataset support with intelligent column management
Dual processing: simultaneous analysis and Python script generation
Python APIs#
Programmatic access for custom analysis workflows:
Data loading: :py:func:
~sda.api.load.load_test, :py:func:~sda.api.load.parse_filesFile decoding: :py:class:
~sda.io.read_trc.ReadTrc, format-specific readersAnalysis utilities: data quality tools and performance metrics
Command-Line Interface#
Terminal-based operations for efficient data workflows:
sda list- discover and filter available testssda print- terminal data exploration with intelligent formattingsda dash- launch dashboard with specific testssda excel/open/plume- file and web operations
Technical Highlights#
Cross-Platform Support: Windows, macOS, Linux compatibility
Performance Optimization: Flask-Caching integration, efficient data processing
Flexible Configuration: Custom test mappings, environment variables
Quality Assurance: Comprehensive testing with unit tests and E2E validation
For complete technical specifications, see the SDA Specification Document.
Use Cases#
1. Retrieve Data#
In your analysis code, you can use retrieval functions to access data from Tests,
such as :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests:
from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data
from sda.api.load import load_test
df = load_test('T183') # 2024+ data
2. Read Data#
In your analysis code, you can use decoding classes such as :py:class:~sda.io.read_trc.ReadTrc:
from sda.io import read_trc
trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")
3. Explore Data#
Launch the interactive dashboard from the console:
sda dash
Or use the dashboard from Python:
from sda.dashboard import run_dashboard
# Launch dashboard for a specific test
run_dashboard(test_name="T183")
The dashboard provides:
Interactive scatter plots with filtering and color mapping
Line plots for trend analysis
Dynamic filtering by test parameters
4. Performance Analysis#
The SDA library provides comprehensive performance analysis for test data:
Analyze all available test data on your machine
Track STT template compliance and parsing success rates
Visual quality thermometer with progress bars
Performance metrics (parsing speed, file sizes)
Enhanced Excel reports with native tables
Usage:
python examples/database_report_analysis.py
Command Line Interface#
SDA provides a convenient command-line interface for common operations. After installation, you can use the sda command directly:
Usage#
sda <command> [arguments...]
Available Commands#
List Tests#
# List all available tests on the machine
sda list
# Filter tests by pattern (supports wildcards)
sda list --filter "T1*" # Tests starting with T1
sda list --filter "*183*" # Tests containing 183
# Verbose output for detailed information
sda list -v
sda list -vv # Even more verbose
Print DataFrame#
Explore and display test data directly in the terminal:
# Basic usage - uses pandas native display (recommended)
sda print T297
# Show ALL rows using to_string() method
sda print T297 --all
# Select specific columns for better readability
sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)"
# Transpose view - excellent for wide datasets
sda print T297 --transpose
Dashboard Commands#
# Start dashboard with no test preloaded
sda dash
# Start dashboard with a specific test
sda dash T183
# Start dashboard with multiple tests
sda dash T183 T196
# Start dashboard in debug mode (auto-reload)
sda dash --debug
File Operations#
# Open test folder in file explorer
sda open T183
# Open Excel data file with default application
sda excel T183
Web Integration#
# Open Plume webpage for the test
sda plume T183
For comprehensive CLI documentation with all commands and examples, see the Examples section.
For installation instructions, see the Installation page.
Direct Excel files and custom mappings (CUSTOM_FILES)#
Reading a workbook by filesystem path#
Pass a path to an Excel file (.xlsx, .xlsm, …) wherever you load data or open the CLI:
from sda.api.load import load_test
df = load_test("path/to/workbook.xlsx")
sda print path/to/workbook.xlsx
sda dash path/to/workbook.xlsx
SDA selects the appropriate parser (named tables, sample-based layout, etc.) from the file contents.
Short names and cloud: CUSTOM_FILES#
Non-standard layouts or long paths can be mapped in ~/sda.json under CUSTOM_FILES
(legacy CUSTOM_TESTS is still accepted): keys are aliases or stems; values are folders or files.
{
"CUSTOM_FILES": {
"R226": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226",
"CAR008": "~/Spark Cleantech/6. Carbone L2 - Documents/3. Spark Carbons Analysis/CAR008-Suivi analyses _WIP.xlsx",
"CAR012": "~/Spark Cleantech/6. Carbone L2 - Documents/1. Carbon Knowledge/4. Colloidal Properties/CAR012-Carbon Colloidal Properties_Database_WIP.xlsx"
}
}
Notes:
Keys can be the exact filename, its stem, or an alias (e.g.
CAR008,CAR012). Matching is case-insensitive.Values can be a folder path (preferred) or a direct file path; cloud downloads use Graph when configured.
Override without editing the file:
SDA_CUSTOM_FILES(JSON), or legacySDA_CUSTOM_TESTS.
set SDA_CUSTOM_FILES={"R226":"~/Spark Cleantech/.../Requests/R226"}
Example: CAR008 (sample-based carbon summaries) and CAR012 (colloidal properties reference)#
CAR008 — multi-sheet workbook (one sheet per diagnostic, one row per sample); --groupby Sample is typical in the dashboard.
Column headers match the sheets unless the same header text appears on multiple sheets—in that case SDA prefixes with the sheet name (e.g. HAP_Average (%) and MOISTURE_Average (%)).
CAR012 — reference database workbook (flat multi-column layout); load without --groupby.
from sda.api.load import load_test
df = load_test("tests/data/CAR008-Sample.xlsx") # direct Excel path (e.g. repo / CI fixture)
df = load_test("CAR008") # alias via CUSTOM_FILES
df = load_test("CAR012") # alias via CUSTOM_FILES
sda print tests/data/CAR008-Sample.xlsx
sda dash CAR008 --groupby Sample
sda dash CAR008 --groupby Sample --inc-cloud
sda dash CAR012
sda dash CAR012 --inc-cloud
Other custom IDs (e.g. R226):
CLI:
sda excel R226,sda open R226,sda dash R226
from sda.api.load import load_test
df = load_test("R226.xlsm")
License#
Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential