sda.api.database_report#

Database reporting and analysis tools for SDA.

This module provides comprehensive database analysis capabilities for data parsing operations, helping users understand file processing success rates, data volumes, table detection, worksheet structure, and database quality assessment.

Functions#

get_worksheet_names(file_path)

Get list of worksheet names from an Excel file using openpyxl.

format_excel_file(file_path)

Apply professional formatting to an Excel file with native tables and auto-width columns.

save_excel_with_formatting(dataframes, file_path[, ...])

Save multiple DataFrames to Excel with professional formatting.

print_quality_thermometer(results_df)

Print a visual quality thermometer showing data health metrics.

analyze_database_files([file_filter, max_files, ...])

Analyze database files across multiple data sources.

generate_database_report(results_df[, save_to_file])

Generate a detailed database report from parsing results.

create_summary_dataframe(results_df, all_tests, all_files)

Create a comprehensive summary DataFrame from analysis results.

Module Contents#

sda.api.database_report.get_worksheet_names(file_path)#

Get list of worksheet names from an Excel file using openpyxl.

Parameters:

file_path (str) – Path to the Excel file

Returns:

List of worksheet names, or empty list if file cannot be read

Return type:

List[str]

Examples

>>> worksheets = get_worksheet_names("data.xlsx")
>>> print(f"Found {len(worksheets)} worksheets: {worksheets}")
sda.api.database_report.format_excel_file(file_path)#

Apply professional formatting to an Excel file with native tables and auto-width columns.

This function enhances an existing Excel file by: - Adding native Excel tables with professional styling - Auto-adjusting column widths based on content - Making headers bold - Applying consistent formatting across all sheets

Parameters:

file_path (str) – Path to the Excel file to format

Examples

>>> from sda.api.performance import format_excel_file
>>> format_excel_file("my_report.xlsx")
sda.api.database_report.save_excel_with_formatting(dataframes, file_path, max_retries=3)#

Save multiple DataFrames to Excel with professional formatting.

Parameters:
  • dataframes (dict) – Dictionary with sheet names as keys and DataFrames as values

  • file_path (str) – Path where to save the Excel file

  • max_retries (int, default 3) – Number of retry attempts if file is locked

Returns:

True if successful, False otherwise

Return type:

bool

Examples

>>> from sda.api.performance import save_excel_with_formatting
>>> data = {"Summary": summary_df, "Results": results_df}
>>> success = save_excel_with_formatting(data, "report.xlsx")
sda.api.database_report.print_quality_thermometer(results_df)#

Print a visual quality thermometer showing data health metrics.

Parameters:

results_df (pd.DataFrame) – Results DataFrame from analyze_database_files()

Examples

>>> from sda.api.performance import (
...     analyze_database_files,
...     print_quality_thermometer,
... )
>>> results = analyze_database_files(max_files=10)
>>> print_quality_thermometer(results)
sda.api.database_report.analyze_database_files(file_filter='*.xls*', max_files=None, verbose=True, include_details=False)#

Analyze database files across multiple data sources.

This function discovers data files, parses them individually while measuring success rates, and returns a comprehensive analysis of table detection, worksheets, and data structure.

Parameters:
  • file_filter (str, default "*.xls*") – Filter pattern for files to analyze (e.g., “.xls”, “.csv”, “”).

  • max_files (int, optional) – Maximum number of files to process (useful for quick testing).

  • verbose (bool, default True) – Whether to print progress information.

  • include_details (bool, default False) – Whether to include detailed column information in the results.

Returns:

DataFrame with columns: - test_name: Name of the test - file_name: Name of the parsed file - file_path: Full path to the file - file_size_mb: Size of the file in megabytes - num_points: Number of data points (rows) in the parsed DataFrame - num_columns: Number of columns in the parsed DataFrame - parse_time_seconds: Time taken to parse the file - points_per_second: Parsing rate (points per second) - mb_per_second: File processing rate (MB per second) - status: Success/Error status - error_message: Error details if parsing failed - plume_url: URL to the test path - full_path: Full path to the file if include_details is True

Return type:

pd.DataFrame

Examples

>>> from sda.api.performance import analyze_database_files

# Analyze all Excel files >>> results = analyze_database_files(file_filter=”.xls”)

# Quick test with first 5 files >>> results = analyze_database_files(max_files=5, verbose=True)

# Analyze specific file types >>> csv_results = analyze_database_files(file_filter=”*.csv”)

See also

generate_database_report

Generate detailed reports from analysis results

print_quality_thermometer

Display visual quality assessment

sda.api.database_report.generate_database_report(results_df, save_to_file=None)#

Generate a detailed database report from parsing results.

Parameters:
  • results_df (pd.DataFrame) – Results from analyze_database_files()

  • save_to_file (str, optional) – Path to save the report (Excel format)

Returns:

Dictionary containing summary statistics

Return type:

dict

Examples

>>> from sda.api.performance import (
...     analyze_database_files,
...     generate_database_report,
... )
>>> results = analyze_database_files(max_files=10)
>>> summary = generate_database_report(results, save_to_file="report.xlsx")

See also

analyze_database_files

Run performance analysis on data files

print_quality_thermometer

Display visual quality metrics

sda.api.database_report.create_summary_dataframe(results_df, all_tests, all_files)#

Create a comprehensive summary DataFrame from analysis results.

This function generates a structured summary of the analysis including: - Overall discovery metrics (tests, files) - File analysis metrics (success/failure rates) - Data volume metrics - Performance metrics

Parameters:
  • results_df (pd.DataFrame) – Results from analyze_database_files()

  • all_tests (list) – List of all discovered tests

  • all_files (list) – List of all discovered files

Returns:

Summary DataFrame with metrics organized by category

Return type:

pd.DataFrame

Examples

>>> from sda.api.performance import (
...     analyze_database_files,
...     create_summary_dataframe,
... )
>>> from sda.api import list_all_tests, list_all_files
>>>
>>> results = analyze_database_files(max_files=10)
>>> all_tests = list_all_tests()
>>> all_files = list_all_files()
>>> summary_df = create_summary_dataframe(results, all_tests, all_files)