sda.api.database_report#

Database reporting and analysis tools for SDA.

This module provides comprehensive database analysis capabilities for data parsing operations, helping users understand file processing success rates, data volumes, table detection, worksheet structure, and database quality assessment.

Functions#

`get_worksheet_names`(file_path)	Get list of worksheet names from an Excel file using openpyxl.
`format_excel_file`(file_path)	Apply professional formatting to an Excel file with native tables and auto-width columns.
`save_excel_with_formatting`(dataframes, file_path[, ...])	Save multiple DataFrames to Excel with professional formatting.
`print_quality_thermometer`(results_df)	Print a visual quality thermometer showing data health metrics.
`analyze_database_files`([file_filter, max_files, ...])	Analyze database files across multiple data sources.
`generate_database_report`(results_df[, save_to_file])	Generate a detailed database report from parsing results.
`create_summary_dataframe`(results_df, all_tests, all_files)	Create a comprehensive summary DataFrame from analysis results.

Module Contents#

sda.api.database_report.get_worksheet_names(file_path)#

Get list of worksheet names from an Excel file using openpyxl.

Parameters:: file_path (str) – Path to the Excel file
Returns:: List of worksheet names, or empty list if file cannot be read
Return type:: List[str]

Examples

>>> worksheets = get_worksheet_names("data.xlsx")
>>> print(f"Found {len(worksheets)} worksheets: {worksheets}")

sda.api.database_report.format_excel_file(file_path)#

Apply professional formatting to an Excel file with native tables and auto-width columns.

This function enhances an existing Excel file by: - Adding native Excel tables with professional styling - Auto-adjusting column widths based on content - Making headers bold - Applying consistent formatting across all sheets

Parameters:: file_path (str) – Path to the Excel file to format

Examples

>>> from sda.api.performance import format_excel_file
>>> format_excel_file("my_report.xlsx")

sda.api.database_report.save_excel_with_formatting(dataframes, file_path, max_retries=3)#

Save multiple DataFrames to Excel with professional formatting.

Parameters:

dataframes (dict) – Dictionary with sheet names as keys and DataFrames as values
file_path (str) – Path where to save the Excel file
max_retries (int, default 3) – Number of retry attempts if file is locked

Returns:

True if successful, False otherwise

Return type:

bool

Examples

>>> from sda.api.performance import save_excel_with_formatting
>>> data = {"Summary": summary_df, "Results": results_df}
>>> success = save_excel_with_formatting(data, "report.xlsx")

sda.api.database_report.print_quality_thermometer(results_df)#

Print a visual quality thermometer showing data health metrics.

Parameters:: results_df (pd.DataFrame) – Results DataFrame from analyze_database_files()

Examples

>>> from sda.api.performance import (
...     analyze_database_files,
...     print_quality_thermometer,
... )
>>> results = analyze_database_files(max_files=10)
>>> print_quality_thermometer(results)

sda.api.database_report.analyze_database_files(file_filter='*.xls*', max_files=None, verbose=True, include_details=False)#

Analyze database files across multiple data sources.

This function discovers data files, parses them individually while measuring success rates, and returns a comprehensive analysis of table detection, worksheets, and data structure.

Parameters:

file_filter (str, default "*.xls*") – Filter pattern for files to analyze (e.g., “.xls”, “.csv”, “”).
max_files (int, optional) – Maximum number of files to process (useful for quick testing).
verbose (bool, default True) – Whether to print progress information.
include_details (bool, default False) – Whether to include detailed column information in the results.

Returns:

DataFrame with columns: - test_name: Name of the test - file_name: Name of the parsed file - file_path: Full path to the file - file_size_mb: Size of the file in megabytes - num_points: Number of data points (rows) in the parsed DataFrame - num_columns: Number of columns in the parsed DataFrame - parse_time_seconds: Time taken to parse the file - points_per_second: Parsing rate (points per second) - mb_per_second: File processing rate (MB per second) - status: Success/Error status - error_message: Error details if parsing failed - plume_url: URL to the test path - full_path: Full path to the file if include_details is True

Return type:

pd.DataFrame

Examples

>>> from sda.api.performance import analyze_database_files

# Analyze all Excel files >>> results = analyze_database_files(file_filter=”.xls”)

# Quick test with first 5 files >>> results = analyze_database_files(max_files=5, verbose=True)

# Analyze specific file types >>> csv_results = analyze_database_files(file_filter=”*.csv”)

sda.api.database_report#

Functions#

Module Contents#

This Page