sda.scripts.demon ================= .. py:module:: sda.scripts.demon .. autoapi-nested-parse:: 😈 Demon - Maxwell's Demon for Data Column Name Entropy Reduction. This script analyzes all test data files using the SDA API and intelligently compares column names to identify potential duplicates or similar names that could be standardized. It generates a markdown report with renaming suggestions. Maxwell's Demon is a theoretical entity that reduces entropy by sorting molecular motion. Similarly, this "😈 Demon" reduces the entropy in data column naming by identifying and suggesting standardization of similar column names across files. Usage: python -m sda.scripts.demon python -m sda.scripts.demon --output demon_report.md python -m sda.scripts.demon --verbose --threshold 0.8 Classes ------- .. autoapisummary:: sda.scripts.demon.DemonConfig sda.scripts.demon.ColumnAnalyzer sda.scripts.demon.DemonReportGenerator sda.scripts.demon.SDADemon Functions --------- .. autoapisummary:: sda.scripts.demon.main Module Contents --------------- .. py:class:: DemonConfig Configuration for the 😈 Demon script. .. py:attribute:: similarity_threshold :value: 0.8 .. py:attribute:: output_file :value: 'sda_demon_report.md' .. py:attribute:: dismissed_file :value: 'sda_demon_dismissed.json' .. py:attribute:: verbose :value: True .. py:attribute:: max_files :value: None .. py:attribute:: file_filter :value: '*' .. py:attribute:: column_filter :value: None .. py:method:: from_args(args) :classmethod: Create config from command line arguments. .. py:class:: ColumnAnalyzer(similarity_threshold = 0.8, column_filter = None) Analyzes column names for similarity and standardization opportunities. .. py:attribute:: similarity_threshold :value: 0.8 .. py:attribute:: column_filter :value: None .. py:attribute:: column_registry :type: Dict[str, List[Tuple[str, str]]] .. py:attribute:: similarity_cache :type: Dict[Tuple[str, str], float] .. py:method:: add_columns(file_path, columns) Add columns from a file to the registry. .. py:method:: calculate_similarity(col1, col2) Calculate similarity between two column names using multiple metrics. .. py:method:: find_similar_columns(progress_callback=None) Find groups of similar column names across files. .. py:class:: DemonReportGenerator(config) Generates markdown reports with column renaming suggestions. .. py:attribute:: config .. py:method:: generate_report(suggestions, file_analysis) Generate the complete markdown report. .. py:class:: SDADemon(config) Main 😈 Demon class for column name analysis and standardization. .. py:attribute:: config .. py:attribute:: analyzer .. py:attribute:: report_generator .. py:method:: load_dismissed_suggestions() Load previously dismissed suggestions. .. py:method:: save_dismissed_suggestions(dismissed) Save dismissed suggestions to file. .. py:method:: analyze_all_files() Analyze all data files and find column similarities. .. py:method:: generate_report(suggestions, file_analysis) Generate and save the markdown report. .. py:function:: main() Run the 😈 Demon script.