# Spark Data Access/Analysis (SDA)

- Open all files from Spark's research and industrial environments
- Secure and remote access to Spark Data Structure (SDS) data, with no software requirements.

![image](sda_logo.png)

```{toctree}
:caption: 'Contents:'
:maxdepth: 2

install
auto_examples/index
API Reference <_api/sda/index>
```

## Features & Specifications

SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech's research and industrial data:

### Three-Pillar Architecture

#### 1. 📥 **Retrieve Data**

- **Multiple Data Sources**: SQL databases, Excel files on SharePoint, local file systems
- **Flexible APIs**: High-level functions like :py:func:`~sda.api.load.load_test` and
  :py:func:`~sda.api.load.load_tests` for batch operations
- **Auto-Discovery**: Intelligent test file discovery with pattern matching and wildcards
- **Custom Mappings**: Support for non-standard test files through configuration

#### 2. 🔓 **Open/Decode Data**

- **Format Support**: Excel (.xlsx, .xls), TRC files, various proprietary formats
- **Decoding Utilities**: Specialized readers like :py:class:`sda.io.read_trc.ReadTrc` for different file types
- **Data Validation**: Built-in validation and error handling for data integrity
- **Template Compliance**: STT template tracking and compliance metrics

#### 3. 🔍 **Explore/Visualize Data**

- **Interactive Dashboard**: Web-based visualization with scatter plots, line plots, and filtering
- **Python APIs**: Programmatic data analysis and visualization capabilities
- **Command-Line Interface**: Terminal-based data operations and exploration tools
- **Export & Reproducibility**: Generate Python scripts from interactive analysis workflows

### Key Components

#### **Dashboard Component**

Web-based interface performing all three pillars in a unified experience:

- Real-time visualization and dynamic filtering
- Multi-test dataset support with intelligent column management
- Dual processing: simultaneous analysis and Python script generation

#### **Python APIs**

Programmatic access for custom analysis workflows:

- Data loading: :py:func:`~sda.api.load.load_test`, :py:func:`~sda.api.load.parse_files`
- File decoding: :py:class:`~sda.io.read_trc.ReadTrc`, format-specific readers
- Analysis utilities: data quality tools and performance metrics

#### **Command-Line Interface**

Terminal-based operations for efficient data workflows:

- `sda list` - discover and filter available tests
- `sda print` - terminal data exploration with intelligent formatting
- `sda dash` - launch dashboard with specific tests
- `sda excel/open/plume` - file and web operations

### Technical Highlights

- **Cross-Platform Support**: Windows, macOS, Linux compatibility
- **Performance Optimization**: Flask-Caching integration, efficient data processing
- **Flexible Configuration**: Custom test mappings, environment variables
- **Quality Assurance**: Comprehensive testing with unit tests and E2E validation

For complete technical specifications, see the [SDA Specification Document](https://github.com/spark-cleantech/sda/blob/main/SDA_SPECIFICATION.md).

## Use Cases

### 1. Retrieve Data

In your analysis code, you can use retrieval functions to access data from Tests,
such as :py:func:`~sda.api.load.load_test` and :py:func:`~sda.api.load.load_tests`:

```python
from sda.api.load import load_test
df = load_test('22s37') # 2021, 2022 data
```

```python
from sda.api.load import load_test
df = load_test('T183') # 2024+ data
```

### 2. Read Data

In your analysis code, you can use decoding classes such as :py:class:`~sda.io.read_trc.ReadTrc`:

```python
from sda.io import read_trc

trc_reader = read_trc.ReadTrc()
x, y, d = trc_reader.open("Path/to/my/file")
```

### 3. Explore Data

Launch the interactive dashboard from the console:

```bash
sda dash
```

Or use the dashboard from Python:

```python
from sda.dashboard import run_dashboard

# Launch dashboard for a specific test
run_dashboard(test_name="T183")
```

The dashboard provides:

- **Interactive scatter plots** with filtering and color mapping
- **Line plots** for trend analysis
- **Dynamic filtering** by test parameters

### 4. Performance Analysis

The SDA library provides comprehensive performance analysis for test data:

- Analyze all available test data on your machine
- Track STT template compliance and parsing success rates
- Visual quality thermometer with progress bars
- Performance metrics (parsing speed, file sizes)
- Enhanced Excel reports with native tables

**Usage:**

```bash
python examples/database_report_analysis.py
```

## Command Line Interface

SDA provides a convenient command-line interface for common operations. After installation, you can use the `sda` command directly:

### Usage

```bash
sda <command> [arguments...]
```

### Available Commands

#### List Tests

```bash
# List all available tests on the machine
sda list

# Filter tests by pattern (supports wildcards)
sda list --filter "T1*"         # Tests starting with T1
sda list --filter "*183*"       # Tests containing 183

# Verbose output for detailed information
sda list -v
sda list -vv                     # Even more verbose
```

#### Print DataFrame

Explore and display test data directly in the terminal:

```bash
# Basic usage - uses pandas native display (recommended)
sda print T297

# Show ALL rows using to_string() method
sda print T297 --all

# Select specific columns for better readability
sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)"

# Transpose view - excellent for wide datasets
sda print T297 --transpose
```

#### Dashboard Commands

```bash
# Start dashboard with no test preloaded
sda dash

# Start dashboard with a specific test
sda dash T183

# Start dashboard with multiple tests
sda dash T183 T196

# Start dashboard in debug mode (auto-reload)
sda dash --debug
```

#### File Operations

```bash
# Open test folder in file explorer
sda open T183

# Open Excel data file with default application
sda excel T183
```

#### Web Integration

```bash
# Open Plume webpage for the test
sda plume T183
```

For comprehensive CLI documentation with all commands and examples, see the [Examples](auto_examples/index) section.

For installation instructions, see the [Installation](install) page.

# Custom Tests

SDA can load non-standard test files (e.g., not following Txxx naming) using a custom mapping declared
in your `~/sda.json` under the `CUSTOM_TESTS` key:

```json
{
  "CUSTOM_TESTS": {
    "R226.xlm": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"
  }
}
```

Notes:

- Keys can be either the exact filename (e.g., `R226.xlsm`) or its stem (`R226`). Matching is
  case-insensitive, and stems allow requests like `R226.xlsm` to resolve from a `R226.xlm` key.
- Values can be a folder path (preferred) or a direct file path. If a file path is given, SDA uses
  its parent directory as the data folder.
- You can temporarily override or inject custom mappings without editing the file using the
  `SDA_CUSTOM_TESTS` environment variable (JSON mapping):

```bash
set SDA_CUSTOM_TESTS={"R226.xlm":"~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"}
```

Once configured, you can open or load your custom test via CLI or Python:

- CLI

  - Open Excel: `sda excel R226`
  - Open folder: `sda open R226`
  - Dashboard: `sda dash R226`

- Python

```python
from sda.api.load import load_test
df = load_test("R226.xlsm")
```

## License

```
Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved
Unauthorized copying of this file, via any medium is strictly prohibited
Proprietary and confidential
```

# Indices and tables

- {ref}`genindex`
- {ref}`modindex`
- {ref}`search`