Spark Data Access/Analysis (SDA)#

  • Open all files from Spark’s research and industrial environments

  • Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

image

Features & Specifications#

SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech’s research and industrial data:

Three-Pillar Architecture#

1. 📥 Retrieve Data#

  • Multiple Data Sources: SQL databases, Excel files on SharePoint, local file systems

  • Flexible APIs: High-level functions like :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests for batch operations

  • Auto-Discovery: Intelligent test file discovery with pattern matching and wildcards

  • Custom Mappings: Support for non-standard test files through configuration

2. 🔓 Open/Decode Data#

  • Format Support: Excel (.xlsx, .xls), TRC files, various proprietary formats

  • Decoding Utilities: Specialized readers like :py:class:sda.io.read_trc.ReadTrc for different file types

  • Data Validation: Built-in validation and error handling for data integrity

  • Template Compliance: STT template tracking and compliance metrics

3. 🔍 Explore/Visualize Data#

  • Interactive Dashboard: Web-based visualization with scatter plots, line plots, and filtering

  • Python APIs: Programmatic data analysis and visualization capabilities

  • Command-Line Interface: Terminal-based data operations and exploration tools

  • Export & Reproducibility: Generate Python scripts from interactive analysis workflows

Key Components#

Dashboard Component#

Web-based interface performing all three pillars in a unified experience:

  • Real-time visualization and dynamic filtering

  • Multi-test dataset support with intelligent column management

  • Dual processing: simultaneous analysis and Python script generation

Python APIs#

Programmatic access for custom analysis workflows:

  • Data loading: :py:func:~sda.api.load.load_test, :py:func:~sda.api.load.parse_files

  • File decoding: :py:class:~sda.io.read_trc.ReadTrc, format-specific readers

  • Analysis utilities: data quality tools and performance metrics

Command-Line Interface#

Terminal-based operations for efficient data workflows:

  • sda list - discover and filter available tests

  • sda print - terminal data exploration with intelligent formatting

  • sda dash - launch dashboard with specific tests

  • sda excel/open/plume - file and web operations

Technical Highlights#

  • Cross-Platform Support: Windows, macOS, Linux compatibility

  • Performance Optimization: Flask-Caching integration, efficient data processing

  • Flexible Configuration: Custom test mappings, environment variables

  • Quality Assurance: Comprehensive testing with unit tests and E2E validation

For complete technical specifications, see the SDA Specification Document.

Use Cases#

1. Retrieve Data#

In your analysis code, you can use retrieval functions to access data from Tests, such as :py:func:~sda.api.load.load_test and :py:func:~sda.api.load.load_tests:

from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data
from sda.api.load import load_test
df = load_test('T183') # 2024+ data

2. Read Data#

In your analysis code, you can use decoding classes such as :py:class:~sda.io.read_trc.ReadTrc:

from sda.io import read_trc

trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")

3. Explore Data#

Launch the interactive dashboard from the console:

sda dash

Or use the dashboard from Python:

from sda.dashboard import run_dashboard

# Launch dashboard for a specific test
run_dashboard(test_name="T183")

The dashboard provides:

  • Interactive scatter plots with filtering and color mapping

  • Line plots for trend analysis

  • Dynamic filtering by test parameters

4. Performance Analysis#

The SDA library provides comprehensive performance analysis for test data:

  • Analyze all available test data on your machine

  • Track STT template compliance and parsing success rates

  • Visual quality thermometer with progress bars

  • Performance metrics (parsing speed, file sizes)

  • Enhanced Excel reports with native tables

Usage:

python examples/database_report_analysis.py

Command Line Interface#

SDA provides a convenient command-line interface for common operations. After installation, you can use the sda command directly:

Usage#

sda <command> [arguments...]

Available Commands#

List Tests#

# List all available tests on the machine
sda list

# Filter tests by pattern (supports wildcards)
sda list --filter "T1*"         # Tests starting with T1
sda list --filter "*183*"       # Tests containing 183

# Verbose output for detailed information
sda list -v
sda list -vv                     # Even more verbose

Dashboard Commands#

# Start dashboard with no test preloaded
sda dash

# Start dashboard with a specific test
sda dash T183

# Start dashboard with multiple tests
sda dash T183 T196

# Start dashboard in debug mode (auto-reload)
sda dash --debug

File Operations#

# Open test folder in file explorer
sda open T183

# Open Excel data file with default application
sda excel T183

Web Integration#

# Open Plume webpage for the test
sda plume T183

For comprehensive CLI documentation with all commands and examples, see the Examples section.

For installation instructions, see the Installation page.

Custom Tests#

SDA can load non-standard test files (e.g., not following Txxx naming) using a custom mapping declared in your ~/sda.json under the CUSTOM_TESTS key:

{
  "CUSTOM_TESTS": {
    "R226.xlm": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"
  }
}

Notes:

  • Keys can be either the exact filename (e.g., R226.xlsm) or its stem (R226). Matching is case-insensitive, and stems allow requests like R226.xlsm to resolve from a R226.xlm key.

  • Values can be a folder path (preferred) or a direct file path. If a file path is given, SDA uses its parent directory as the data folder.

  • You can temporarily override or inject custom mappings without editing the file using the SDA_CUSTOM_TESTS environment variable (JSON mapping):

set SDA_CUSTOM_TESTS={"R226.xlm":"~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"}

Once configured, you can open or load your custom test via CLI or Python:

  • CLI

    • Open Excel: sda excel R226

    • Open folder: sda open R226

    • Dashboard: sda dash R226

  • Python

from sda.api.load import load_test
df = load_test("R226.xlsm")

License#

Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential

Indices and tables#