sda.column_whitelist#

Column whitelist for the SDA data-demon.

Provides get_column_whitelist() (union of default_sda.json defaults and an optional ~/sda.json override) and the one-line helper both_whitelisted() that the demon uses to skip false-positive suggestions.

The canonical list of verified column names lives in sda/default_sda.json under the key "COLUMN_WHITELIST". Users can append site-specific names by adding the same key in ~/sda.json — those entries are merged with the defaults (union), never replacing them.

Functions#

get_column_whitelist([user_config_path])

Return the union of default and user-defined whitelisted column names.

both_whitelisted(col_a, col_b)

Return True when col_a and col_b are both verified column names.

Module Contents#

sda.column_whitelist.get_column_whitelist(user_config_path=None)#

Return the union of default and user-defined whitelisted column names.

Parameters:

user_config_path – Path to the user’s sda.json. Defaults to ~/sda.json. Pass a non-existent path to skip the user overlay (useful in tests).

Returns:

Immutable set of column name strings. Never raises — falls back to defaults only if the user config is missing or malformed.

Return type:

frozenset[str]

sda.column_whitelist.both_whitelisted(col_a, col_b)#

Return True when col_a and col_b are both verified column names.

When this returns True the demon knows the pair represents two genuinely different measurements and must not suggest merging them, regardless of how high their similarity score is.

Parameters:
  • col_a – Column names to test (exact, case-sensitive string match).

  • col_b – Column names to test (exact, case-sensitive string match).