sda.dashboard.api.pipeline#

API Pipeline – cachetools-based replacement for FlaskCachePipeline.

Identical business logic to FlaskCachePipeline but uses cachetools.TTLCache instead of Flask-Caching, so it does not require a Flask/Dash application.

Attributes#

Classes#

ApiPipeline

Cachetools-based data pipeline for the SDA FastAPI backend.

Module Contents#

sda.dashboard.api.pipeline.logger#
class sda.dashboard.api.pipeline.ApiPipeline(verbose=False)#

Cachetools-based data pipeline for the SDA FastAPI backend.

Mirrors FlaskCachePipeline stage by stage; only the caching mechanism differs.

verbose = False#
load_data(test_name)#

Load and validate test data (cached for 10 min).

select_columns(test_name, user_selection=None)#

Apply column selection logic (cached).

configure_filters(test_name, user_selection=None)#

Configure filter allocation and options (cached).

Filter options are built from the full dataset so that columns outside the user’s axis selection remain available as row filters. Slot allocation follows three strategies (in priority order):

  1. User-selected columns that have filterable values.

  2. Well-known priority columns (_FILTER_PRIORITY_COLUMNS).

  3. Any remaining filterable column.

At most 10 slots are allocated (DYNAMIC_FILTER_SLOTS).

classify_columns(test_name)#

Classify every column in the full dataset for filter-panel rendering.

Returns lightweight metadata only (no options list) — options are loaded lazily via get_column_options() when the user expands a slot.

Classification rules#

  • float64 / int64, >20 unique → “numeric” (min/max provided)

  • float64 / int64, ≤20 unique → “categorical” (min/max provided for slider fallback)

  • bool → “boolean”

  • object → “categorical”

  • datetime64, ≤20 unique → “categorical”

  • datetime64, >20 unique → “datetime” (info-only in UI)

get_column_options(test_name, column)#

Return unique non-null values for one column (lazy load for filter UI).

Categorical/low-cardinality columns only. For numeric columns with >20 unique values the slider is used instead and this method is not called. All unique values are returned (no arbitrary cap) because the user explicitly opened this accordion item and may need any value.

apply_filters(test_name, user_selection=None, applied_filters=None, range_filters=None)#

Apply row filters then column selection (cached, 5-min TTL).

Row filters are applied on the full dataset first so that a filter on a column like Générateur works even when the user’s axis selection does not include that column. Column selection is applied afterwards so the returned DataFrame only contains the columns the caller asked for.

Uses a single boolean mask across all filter operations to avoid creating intermediate DataFrame copies for each column filter.

create_scatter_plot(test_name, user_selection=None, applied_filters=None, range_filters=None, x=None, y=None, color=None, connect_points=False, connect_sort_by=None)#

Create scatter plot figure (cached, 5-min TTL).

clear_cache(test_name=None)#