sda.api.database_report ======================= .. py:module:: sda.api.database_report .. autoapi-nested-parse:: Database reporting and analysis tools for SDA. This module provides comprehensive database analysis capabilities for data parsing operations, helping users understand file processing success rates, data volumes, table detection, worksheet structure, and database quality assessment. Functions --------- .. autoapisummary:: sda.api.database_report.get_worksheet_names sda.api.database_report.format_excel_file sda.api.database_report.save_excel_with_formatting sda.api.database_report.print_quality_thermometer sda.api.database_report.analyze_database_files sda.api.database_report.generate_database_report sda.api.database_report.create_summary_dataframe Module Contents --------------- .. py:function:: get_worksheet_names(file_path) Get list of worksheet names from an Excel file using openpyxl. :param file_path: Path to the Excel file :type file_path: :py:class:`str` :returns: List of worksheet names, or empty list if file cannot be read :rtype: :py:class:`List[str]` .. rubric:: Examples >>> worksheets = get_worksheet_names("data.xlsx") >>> print(f"Found {len(worksheets)} worksheets: {worksheets}") .. py:function:: format_excel_file(file_path) Apply professional formatting to an Excel file with native tables and auto-width columns. This function enhances an existing Excel file by: - Adding native Excel tables with professional styling - Auto-adjusting column widths based on content - Making headers bold - Applying consistent formatting across all sheets :param file_path: Path to the Excel file to format :type file_path: :py:class:`str` .. rubric:: Examples >>> from sda.api.performance import format_excel_file >>> format_excel_file("my_report.xlsx") .. py:function:: save_excel_with_formatting(dataframes, file_path, max_retries = 3) Save multiple DataFrames to Excel with professional formatting. :param dataframes: Dictionary with sheet names as keys and DataFrames as values :type dataframes: :py:class:`dict` :param file_path: Path where to save the Excel file :type file_path: :py:class:`str` :param max_retries: Number of retry attempts if file is locked :type max_retries: :py:class:`int`, *default* ``3`` :returns: True if successful, False otherwise :rtype: :py:class:`bool` .. rubric:: Examples >>> from sda.api.performance import save_excel_with_formatting >>> data = {"Summary": summary_df, "Results": results_df} >>> success = save_excel_with_formatting(data, "report.xlsx") .. py:function:: print_quality_thermometer(results_df) Print a visual quality thermometer showing data health metrics. :param results_df: Results DataFrame from analyze_database_files() :type results_df: :py:class:`pd.DataFrame` .. rubric:: Examples >>> from sda.api.performance import ( ... analyze_database_files, ... print_quality_thermometer, ... ) >>> results = analyze_database_files(max_files=10) >>> print_quality_thermometer(results) .. py:function:: analyze_database_files(file_filter = '*.xls*', max_files = None, verbose = True, include_details = False) Analyze database files across multiple data sources. This function discovers data files, parses them individually while measuring success rates, and returns a comprehensive analysis of table detection, worksheets, and data structure. :param file_filter: Filter pattern for files to analyze (e.g., "*.xls*", "*.csv", "*"). :type file_filter: :py:class:`str`, *default* ``"*.xls*"`` :param max_files: Maximum number of files to process (useful for quick testing). :type max_files: :py:class:`int`, *optional* :param verbose: Whether to print progress information. :type verbose: :py:class:`bool`, *default* :py:obj:`True` :param include_details: Whether to include detailed column information in the results. :type include_details: :py:class:`bool`, *default* :py:obj:`False` :returns: DataFrame with columns: - test_name: Name of the test - file_name: Name of the parsed file - file_path: Full path to the file - file_size_mb: Size of the file in megabytes - num_points: Number of data points (rows) in the parsed DataFrame - num_columns: Number of columns in the parsed DataFrame - parse_time_seconds: Time taken to parse the file - points_per_second: Parsing rate (points per second) - mb_per_second: File processing rate (MB per second) - status: Success/Error status - error_message: Error details if parsing failed - plume_url: URL to the test path - full_path: Full path to the file if include_details is True :rtype: :py:class:`pd.DataFrame` .. rubric:: Examples >>> from sda.api.performance import analyze_database_files # Analyze all Excel files >>> results = analyze_database_files(file_filter="*.xls*") # Quick test with first 5 files >>> results = analyze_database_files(max_files=5, verbose=True) # Analyze specific file types >>> csv_results = analyze_database_files(file_filter="*.csv") .. seealso:: :py:obj:`generate_database_report` Generate detailed reports from analysis results :py:obj:`print_quality_thermometer` Display visual quality assessment .. py:function:: generate_database_report(results_df, save_to_file = None) Generate a detailed database report from parsing results. :param results_df: Results from analyze_database_files() :type results_df: :py:class:`pd.DataFrame` :param save_to_file: Path to save the report (Excel format) :type save_to_file: :py:class:`str`, *optional* :returns: Dictionary containing summary statistics :rtype: :py:class:`dict` .. rubric:: Examples >>> from sda.api.performance import ( ... analyze_database_files, ... generate_database_report, ... ) >>> results = analyze_database_files(max_files=10) >>> summary = generate_database_report(results, save_to_file="report.xlsx") .. seealso:: :py:obj:`analyze_database_files` Run performance analysis on data files :py:obj:`print_quality_thermometer` Display visual quality metrics .. py:function:: create_summary_dataframe(results_df, all_tests, all_files) Create a comprehensive summary DataFrame from analysis results. This function generates a structured summary of the analysis including: - Overall discovery metrics (tests, files) - File analysis metrics (success/failure rates) - Data volume metrics - Performance metrics :param results_df: Results from analyze_database_files() :type results_df: :py:class:`pd.DataFrame` :param all_tests: List of all discovered tests :type all_tests: :py:class:`list` :param all_files: List of all discovered files :type all_files: :py:class:`list` :returns: Summary DataFrame with metrics organized by category :rtype: :py:class:`pd.DataFrame` .. rubric:: Examples >>> from sda.api.performance import ( ... analyze_database_files, ... create_summary_dataframe, ... ) >>> from sda.api import list_all_tests, list_all_files >>> >>> results = analyze_database_files(max_files=10) >>> all_tests = list_all_tests() >>> all_files = list_all_files() >>> summary_df = create_summary_dataframe(results, all_tests, all_files)