sda.scripts.demon
=================

.. py:module:: sda.scripts.demon

.. autoapi-nested-parse::

   😈 Demon - Maxwell's Demon for Data Column Name Entropy Reduction.

   This script analyzes all test data files using the SDA API and intelligently
   compares column names to identify potential duplicates or similar names that
   could be standardized. It generates a markdown report with renaming suggestions.

   Maxwell's Demon is a theoretical entity that reduces entropy by sorting molecular
   motion. Similarly, this "😈 Demon" reduces the entropy in data column naming by
   identifying and suggesting standardization of similar column names across files.

   Usage:
       python -m sda.scripts.demon
       python -m sda.scripts.demon --output demon_report.md
       python -m sda.scripts.demon --verbose --threshold 0.8


Classes
-------

.. autoapisummary::

   sda.scripts.demon.DemonConfig
   sda.scripts.demon.ColumnAnalyzer
   sda.scripts.demon.DemonReportGenerator
   sda.scripts.demon.SDADemon


Functions
---------

.. autoapisummary::

   sda.scripts.demon.main


Module Contents
---------------

.. py:class:: DemonConfig

   Configuration for the 😈 Demon script.


   .. py:attribute:: similarity_threshold
      :value: 0.8


   .. py:attribute:: output_file
      :value: 'sda_demon_report.md'


   .. py:attribute:: dismissed_file
      :value: 'sda_demon_dismissed.json'


   .. py:attribute:: verbose
      :value: True


   .. py:attribute:: max_files
      :value: None


   .. py:attribute:: file_filter
      :value: '*'


   .. py:attribute:: column_filter
      :value: None


   .. py:method:: from_args(args)
      :classmethod:


      Create config from command line arguments.


.. py:class:: ColumnAnalyzer(similarity_threshold = 0.8, column_filter = None)

   Analyzes column names for similarity and standardization opportunities.


   .. py:attribute:: similarity_threshold
      :value: 0.8


   .. py:attribute:: column_filter
      :value: None


   .. py:attribute:: column_registry
      :type:  Dict[str, List[Tuple[str, str]]]


   .. py:attribute:: similarity_cache
      :type:  Dict[Tuple[str, str], float]


   .. py:method:: add_columns(file_path, columns)

      Add columns from a file to the registry.


   .. py:method:: calculate_similarity(col1, col2)

      Calculate similarity between two column names using multiple metrics.


   .. py:method:: find_similar_columns(progress_callback=None)

      Find groups of similar column names across files.


.. py:class:: DemonReportGenerator(config)

   Generates markdown reports with column renaming suggestions.


   .. py:attribute:: config


   .. py:method:: generate_report(suggestions, file_analysis)

      Generate the complete markdown report.


.. py:class:: SDADemon(config)

   Main 😈 Demon class for column name analysis and standardization.


   .. py:attribute:: config


   .. py:attribute:: analyzer


   .. py:attribute:: report_generator


   .. py:method:: load_dismissed_suggestions()

      Load previously dismissed suggestions.


   .. py:method:: save_dismissed_suggestions(dismissed)

      Save dismissed suggestions to file.


   .. py:method:: analyze_all_files()

      Analyze all data files and find column similarities.


   .. py:method:: generate_report(suggestions, file_analysis)

      Generate and save the markdown report.


.. py:function:: main()

   Run the 😈 Demon script.