sda.scripts.demon#
😈 Demon - Maxwell’s Demon for Data Column Name Entropy Reduction.
This script analyzes all test data files using the SDA API and intelligently compares column names to identify potential duplicates or similar names that could be standardized. It generates a markdown report with renaming suggestions.
Maxwell’s Demon is a theoretical entity that reduces entropy by sorting molecular motion. Similarly, this “😈 Demon” reduces the entropy in data column naming by identifying and suggesting standardization of similar column names across files.
- Usage:
python -m sda.scripts.demon python -m sda.scripts.demon –output demon_report.md python -m sda.scripts.demon –verbose –threshold 0.8
Classes#
Configuration for the 😈 Demon script. |
|
Analyzes column names for similarity and standardization opportunities. |
|
Generates markdown reports with column renaming suggestions. |
|
Main 😈 Demon class for column name analysis and standardization. |
Functions#
|
Run the 😈 Demon script. |
Module Contents#
- class sda.scripts.demon.DemonConfig#
Configuration for the 😈 Demon script.
- similarity_threshold = 0.8#
- output_file = 'sda_demon_report.md'#
- dismissed_file = 'sda_demon_dismissed.json'#
- verbose = True#
- max_files = None#
- file_filter = '*'#
- column_filter = None#
- classmethod from_args(args)#
Create config from command line arguments.
- class sda.scripts.demon.ColumnAnalyzer(similarity_threshold=0.8, column_filter=None)#
Analyzes column names for similarity and standardization opportunities.
- similarity_threshold = 0.8#
- column_filter = None#
- add_columns(file_path, columns)#
Add columns from a file to the registry.
- calculate_similarity(col1, col2)#
Calculate similarity between two column names using multiple metrics.
- find_similar_columns(progress_callback=None)#
Find groups of similar column names across files.
- class sda.scripts.demon.DemonReportGenerator(config)#
Generates markdown reports with column renaming suggestions.
- config#
- generate_report(suggestions, file_analysis)#
Generate the complete markdown report.
- class sda.scripts.demon.SDADemon(config)#
Main 😈 Demon class for column name analysis and standardization.
- config#
- analyzer#
- report_generator#
- load_dismissed_suggestions()#
Load previously dismissed suggestions.
- save_dismissed_suggestions(dismissed)#
Save dismissed suggestions to file.
- analyze_all_files()#
Analyze all data files and find column similarities.
- generate_report(suggestions, file_analysis)#
Generate and save the markdown report.
- sda.scripts.demon.main()#
Run the 😈 Demon script.