Installation#

This page provides comprehensive installation instructions for SDA (Spark Data Access/Analysis).

Installation Options#

Option 1: Dedicated Environment#

Install sda in a dedicated environment:

git clone https://github.com/spark-cleantech/sda.git
cd sda  # go in parent sda folder
conda env create --file .\environment.yml
conda activate sda
pip install -e .         # install current folder (`.`) in editable mode

Option 2: Existing Environment#

Install sda in an existing environment, for instance (below) spy:

Assuming you already have created the spy Anaconda environment, clone the repository and install sda:

git clone https://github.com/spark-cleantech/sda.git
cd sda
conda activate spy
conda env update --name spy --file .\environment.yml    # Update Spy environment with packages from Sda
pip install -e .         # install current folder (`.`) in editable mode

Verification#

Check that the install went smoothly. Open a Terminal/Console, and run:

conda activate sda     # or 'spy' if you installed in 'spy' environment
sda list

If SDA was installed properly, this will list all Data Files available on your local machine. You are now all set to start using SDA!

Update Package#

Pull the latest version from GitHub. Update the dependencies, reinstall:

cd sda
git pull
conda env update --name spy --file .\environment.yml
pip install -e .

Developer Setup#

For developers and contributors, follow these additional steps:

Prerequisites#

Clone the repository:

git clone https://github.com/spark-cleantech/sda.git
cd sda

For best experience create a new conda environment (e.g. sda-env) with Python 3.11:

conda create -n sda-env -c conda-forge python=3.11 -y
conda activate sda-env

Development Workflow#

Before pushing to GitHub, run the following commands:

Update conda environment: make conda-env-update
Install this package in editable mode: pip install -e .
(optional) Sync with the latest template: make template-update
(optional) Run quality assurance checks (code linting): make qa
(optional) Run tests: make unit-tests
(optional) Run the static type checker: make type-check
(optional) Build the documentation: make docs-build

Cross-Platform Notes#

If using Windows, make is not available by default. Either install it (for instance with Chocolatey), or open the Makefile and execute the lines therein manually.

Testing#

The SDA project uses a comprehensive testing strategy with separated test suites:

Test Categories#

Unit Tests#

Fast-running tests that don’t require browser automation:

# Run unit tests only (excludes E2E tests)
make unit-tests

# Run with coverage report
make unit-tests COV_REPORT=html

End-to-End (E2E) Tests#

Browser-based tests using Playwright for complete workflow validation:

# Run E2E tests only
make e2e-tests

# Run E2E tests with detailed output
python -m pytest -vv -m "e2e" --tb=short

Running All Tests#

# Run both unit and E2E tests
python -m pytest tests/

# Run with specific markers
python -m pytest -m "not slow"  # Skip slow tests

Test Structure#

Unit Tests: tests/ directory, any file not ending with _e2e.py
E2E Tests: tests/dashboard/*_e2e.py files, automatically marked with @pytest.mark.e2e
Test Configuration: tests/conftest.py and tests/dashboard/conftest.py

CI/CD Testing#

The CI pipeline runs tests in two separate jobs:

Unit Tests Job: Fast feedback (no browser dependencies)
E2E Tests Job: Comprehensive testing with browser automation (runs after unit tests pass)

This approach provides:

Faster feedback for most common development scenarios
Efficient resource utilization in CI
Clear separation of concerns between test types

Development Testing#

For development, you can run specific test files:

# Test specific functionality
python -m pytest tests/dashboard/test_pipeline_pure.py -v

# Test with browser automation
python -m pytest tests/dashboard/test_scatter_filter_integration_e2e.py -v

# Run with live browser (for debugging)
python -m pytest tests/dashboard/test_dashboard_e2e.py -v --headed