# Spark Data Access/Analysis (SDA) - Open all files from Spark's research and industrial environments - Secure and remote access to Spark Data Structure (SDS) data, with no software requirements. ![image](sda_logo.png) ```{toctree} :caption: 'Contents:' :maxdepth: 2 install auto_examples/index API Reference <_api/sda/index> ``` ## Features & Specifications SDA (Spark Data Access/Analysis) is a comprehensive data library built around three core pillars for working with Spark Cleantech's research and industrial data: ### Three-Pillar Architecture #### 1. 📥 **Retrieve Data** - **Multiple Data Sources**: SQL databases, Excel files on SharePoint, local file systems - **Flexible APIs**: High-level functions like :py:func:`~sda.api.load.load_test` and :py:func:`~sda.api.load.load_tests` for batch operations - **Auto-Discovery**: Intelligent test file discovery with pattern matching and wildcards - **Custom Mappings**: Support for non-standard test files through configuration #### 2. 🔓 **Open/Decode Data** - **Format Support**: Excel (.xlsx, .xls), TRC files, various proprietary formats - **Decoding Utilities**: Specialized readers like :py:class:`sda.io.read_trc.ReadTrc` for different file types - **Data Validation**: Built-in validation and error handling for data integrity - **Template Compliance**: STT template tracking and compliance metrics #### 3. 🔍 **Explore/Visualize Data** - **Interactive Dashboard**: Web-based visualization with scatter plots, line plots, and filtering - **Python APIs**: Programmatic data analysis and visualization capabilities - **Command-Line Interface**: Terminal-based data operations and exploration tools - **Export & Reproducibility**: Generate Python scripts from interactive analysis workflows ### Key Components #### **Dashboard Component** Web-based interface performing all three pillars in a unified experience: - Real-time visualization and dynamic filtering - Multi-test dataset support with intelligent column management - Dual processing: simultaneous analysis and Python script generation #### **Python APIs** Programmatic access for custom analysis workflows: - Data loading: :py:func:`~sda.api.load.load_test`, :py:func:`~sda.api.load.parse_files` - File decoding: :py:class:`~sda.io.read_trc.ReadTrc`, format-specific readers - Analysis utilities: data quality tools and performance metrics #### **Command-Line Interface** Terminal-based operations for efficient data workflows: - `sda list` - discover and filter available tests - `sda print` - terminal data exploration with intelligent formatting - `sda dash` - launch dashboard with specific tests - `sda excel/open/plume` - file and web operations ### Technical Highlights - **Cross-Platform Support**: Windows, macOS, Linux compatibility - **Performance Optimization**: Flask-Caching integration, efficient data processing - **Flexible Configuration**: Custom test mappings, environment variables - **Quality Assurance**: Comprehensive testing with unit tests and E2E validation For complete technical specifications, see the [SDA Specification Document](https://github.com/spark-cleantech/sda/blob/main/SDA_SPECIFICATION.md). ## Use Cases ### 1. Retrieve Data In your analysis code, you can use retrieval functions to access data from Tests, such as :py:func:`~sda.api.load.load_test` and :py:func:`~sda.api.load.load_tests`: ```python from sda.api.load import load_test df = load_test('22s37') # 2021, 2022 data ``` ```python from sda.api.load import load_test df = load_test('T183') # 2024+ data ``` ### 2. Read Data In your analysis code, you can use decoding classes such as :py:class:`~sda.io.read_trc.ReadTrc`: ```python from sda.io import read_trc trc_reader = read_trc.ReadTrc() x, y, d = trc_reader.open("Path/to/my/file") ``` ### 3. Explore Data Launch the interactive dashboard from the console: ```bash sda dash ``` Or use the dashboard from Python: ```python from sda.dashboard import run_dashboard # Launch dashboard for a specific test run_dashboard(test_name="T183") ``` The dashboard provides: - **Interactive scatter plots** with filtering and color mapping - **Line plots** for trend analysis - **Dynamic filtering** by test parameters ### 4. Performance Analysis The SDA library provides comprehensive performance analysis for test data: - Analyze all available test data on your machine - Track STT template compliance and parsing success rates - Visual quality thermometer with progress bars - Performance metrics (parsing speed, file sizes) - Enhanced Excel reports with native tables **Usage:** ```bash python examples/database_report_analysis.py ``` ## Command Line Interface SDA provides a convenient command-line interface for common operations. After installation, you can use the `sda` command directly: ### Usage ```bash sda [arguments...] ``` ### Available Commands #### List Tests ```bash # List all available tests on the machine sda list # Filter tests by pattern (supports wildcards) sda list --filter "T1*" # Tests starting with T1 sda list --filter "*183*" # Tests containing 183 # Verbose output for detailed information sda list -v sda list -vv # Even more verbose ``` #### Print DataFrame Explore and display test data directly in the terminal: ```bash # Basic usage - uses pandas native display (recommended) sda print T297 # Show ALL rows using to_string() method sda print T297 --all # Select specific columns for better readability sda print T297 --columns run "Tension NRP (kV)" "Fréquence (kHz)" # Transpose view - excellent for wide datasets sda print T297 --transpose ``` #### Dashboard Commands ```bash # Start dashboard with no test preloaded sda dash # Start dashboard with a specific test sda dash T183 # Start dashboard with multiple tests sda dash T183 T196 # Start dashboard in debug mode (auto-reload) sda dash --debug ``` #### File Operations ```bash # Open test folder in file explorer sda open T183 # Open Excel data file with default application sda excel T183 ``` #### Web Integration ```bash # Open Plume webpage for the test sda plume T183 ``` For comprehensive CLI documentation with all commands and examples, see the [Examples](auto_examples/index) section. For installation instructions, see the [Installation](install) page. # Custom Tests SDA can load non-standard test files (e.g., not following Txxx naming) using a custom mapping declared in your `~/sda.json` under the `CUSTOM_TESTS` key: ```json { "CUSTOM_TESTS": { "R226.xlm": "~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226" } } ``` Notes: - Keys can be either the exact filename (e.g., `R226.xlsm`) or its stem (`R226`). Matching is case-insensitive, and stems allow requests like `R226.xlsm` to resolve from a `R226.xlm` key. - Values can be a folder path (preferred) or a direct file path. If a file path is given, SDA uses its parent directory as the data folder. - You can temporarily override or inject custom mappings without editing the file using the `SDA_CUSTOM_TESTS` environment variable (JSON mapping): ```bash set SDA_CUSTOM_TESTS={"R226.xlm":"~/Spark Cleantech/SPARK - Documents/6. R&D/3 - Answer to REQUESTS/Requests/R226"} ``` Once configured, you can open or load your custom test via CLI or Python: - CLI - Open Excel: `sda excel R226` - Open folder: `sda open R226` - Dashboard: `sda dash R226` - Python ```python from sda.api.load import load_test df = load_test("R226.xlsm") ``` ## License ``` Copyright (C) Spark Cleantech SAS (SIREN 909736068) - All Rights Reserved Unauthorized copying of this file, via any medium is strictly prohibited Proprietary and confidential ``` # Indices and tables - {ref}`genindex` - {ref}`modindex` - {ref}`search`