sda.api.database_report#
Database reporting and analysis tools for SDA.
This module provides comprehensive database analysis capabilities for data parsing operations, helping users understand file processing success rates, data volumes, table detection, worksheet structure, and database quality assessment.
Functions#
|
Get list of worksheet names from an Excel file using openpyxl. |
|
Apply professional formatting to an Excel file with native tables and auto-width columns. |
|
Save multiple DataFrames to Excel with professional formatting. |
|
Print a visual quality thermometer showing data health metrics. |
|
Analyze database files across multiple data sources. |
|
Generate a detailed database report from parsing results. |
|
Create a comprehensive summary DataFrame from analysis results. |
Module Contents#
- sda.api.database_report.get_worksheet_names(file_path)#
Get list of worksheet names from an Excel file using openpyxl.
- Parameters:
file_path (
str) – Path to the Excel file- Returns:
List of worksheet names, or empty list if file cannot be read
- Return type:
List[str]
Examples
>>> worksheets = get_worksheet_names("data.xlsx") >>> print(f"Found {len(worksheets)} worksheets: {worksheets}")
- sda.api.database_report.format_excel_file(file_path)#
Apply professional formatting to an Excel file with native tables and auto-width columns.
This function enhances an existing Excel file by: - Adding native Excel tables with professional styling - Auto-adjusting column widths based on content - Making headers bold - Applying consistent formatting across all sheets
- Parameters:
file_path (
str) – Path to the Excel file to format
Examples
>>> from sda.api.performance import format_excel_file >>> format_excel_file("my_report.xlsx")
- sda.api.database_report.save_excel_with_formatting(dataframes, file_path, max_retries=3)#
Save multiple DataFrames to Excel with professional formatting.
- Parameters:
- Returns:
True if successful, False otherwise
- Return type:
Examples
>>> from sda.api.performance import save_excel_with_formatting >>> data = {"Summary": summary_df, "Results": results_df} >>> success = save_excel_with_formatting(data, "report.xlsx")
- sda.api.database_report.print_quality_thermometer(results_df)#
Print a visual quality thermometer showing data health metrics.
- Parameters:
results_df (
pd.DataFrame) – Results DataFrame from analyze_database_files()
Examples
>>> from sda.api.performance import ( ... analyze_database_files, ... print_quality_thermometer, ... ) >>> results = analyze_database_files(max_files=10) >>> print_quality_thermometer(results)
- sda.api.database_report.analyze_database_files(file_filter='*.xls*', max_files=None, verbose=True, include_details=False)#
Analyze database files across multiple data sources.
This function discovers data files, parses them individually while measuring success rates, and returns a comprehensive analysis of table detection, worksheets, and data structure.
- Parameters:
file_filter (
str, default"*.xls*") – Filter pattern for files to analyze (e.g., “.xls”, “.csv”, “”).max_files (
int, optional) – Maximum number of files to process (useful for quick testing).verbose (
bool, defaultTrue) – Whether to print progress information.include_details (
bool, defaultFalse) – Whether to include detailed column information in the results.
- Returns:
DataFrame with columns: - test_name: Name of the test - file_name: Name of the parsed file - file_path: Full path to the file - file_size_mb: Size of the file in megabytes - num_points: Number of data points (rows) in the parsed DataFrame - num_columns: Number of columns in the parsed DataFrame - parse_time_seconds: Time taken to parse the file - points_per_second: Parsing rate (points per second) - mb_per_second: File processing rate (MB per second) - status: Success/Error status - error_message: Error details if parsing failed - plume_url: URL to the test path - full_path: Full path to the file if include_details is True
- Return type:
pd.DataFrame
Examples
>>> from sda.api.performance import analyze_database_files
# Analyze all Excel files >>> results = analyze_database_files(file_filter=”.xls”)
# Quick test with first 5 files >>> results = analyze_database_files(max_files=5, verbose=True)
# Analyze specific file types >>> csv_results = analyze_database_files(file_filter=”*.csv”)
See also
generate_database_reportGenerate detailed reports from analysis results
print_quality_thermometerDisplay visual quality assessment
- sda.api.database_report.generate_database_report(results_df, save_to_file=None)#
Generate a detailed database report from parsing results.
- Parameters:
results_df (
pd.DataFrame) – Results from analyze_database_files()save_to_file (
str, optional) – Path to save the report (Excel format)
- Returns:
Dictionary containing summary statistics
- Return type:
Examples
>>> from sda.api.performance import ( ... analyze_database_files, ... generate_database_report, ... ) >>> results = analyze_database_files(max_files=10) >>> summary = generate_database_report(results, save_to_file="report.xlsx")
See also
analyze_database_filesRun performance analysis on data files
print_quality_thermometerDisplay visual quality metrics
- sda.api.database_report.create_summary_dataframe(results_df, all_tests, all_files)#
Create a comprehensive summary DataFrame from analysis results.
This function generates a structured summary of the analysis including: - Overall discovery metrics (tests, files) - File analysis metrics (success/failure rates) - Data volume metrics - Performance metrics
- Parameters:
- Returns:
Summary DataFrame with metrics organized by category
- Return type:
pd.DataFrame
Examples
>>> from sda.api.performance import ( ... analyze_database_files, ... create_summary_dataframe, ... ) >>> from sda.api import list_all_tests, list_all_files >>> >>> results = analyze_database_files(max_files=10) >>> all_tests = list_all_tests() >>> all_files = list_all_files() >>> summary_df = create_summary_dataframe(results, all_tests, all_files)