Home | Proposal | Milestone | Final | Analysis Notebook Guide

Data Analyzer Guide

This guide explains how to use the DataAnalyzer class and related functions to load and analyze performance data from multi-tenant GPU experiments.

Quick Start
Loading Data
Computing Statistics
Visualization
Multi-Tenant Analysis
Fairness & Statistical Significance
Resource Hogging Detection
Complete Example Workflow

Quick Start

The analyzer is implemented in the Jupyter notebook at notebooks/analysis.ipynb. To get started:

# Run the notebook to load the DataAnalyzer class
# Then create an instance
analyzer = DataAnalyzer()

# Load data
analyzer.load_files('path/to/data.csv', dataset_name='experiment1')

# Compute statistics
stats = analyzer.compute_statistics(column='latency_ms')

# Visualize
analyzer.plot_distributions(column='latency_ms')

Loading Data

Basic File Loading

The DataAnalyzer class supports CSV, JSON, and JSONL file formats.

Load a Single File

analyzer = DataAnalyzer()
analyzer.load_files('data/experiment/events.jsonl', dataset_name='experiment1')

Load Multiple Files

analyzer.load_files(
    ['data/exp1/events.jsonl', 'data/exp2/events.jsonl'],
    label_extractor=lambda path: path.parent.name
)

Load Files Matching a Pattern

# Load all JSONL files under a directory tree
analyzer.load_pattern(
    '../data/3_distilgpt2_mps/**/events.jsonl',
    label_extractor=lambda name: name.parent.name
)

Label Extractors

Label extractors help organize datasets by extracting meaningful names from file paths. Two helper functions are provided:

# Extract batch size from filename like 'model_b8_L128_latencies_ms.csv'
analyzer.load_pattern('*.csv', label_extractor=extract_batch_size)

# Extract model name and batch size
analyzer.load_pattern('*.csv', label_extractor=extract_model_and_batch)

Supported File Formats

CSV: Standard comma-separated values
JSON: Single JSON object or array
JSONL: Newline-delimited JSON (one object per line)

Computing Statistics

Basic Statistics

# Compute statistics for all loaded datasets
stats = analyzer.compute_statistics(
    column='latency_ms',
    percentiles=[50, 90, 95, 99]
)
print(stats.to_string())

Returns a DataFrame with:

count: Number of samples
mean: Average value
std: Standard deviation
min / max: Minimum and maximum values
p50, p90, p95, p99: Percentiles

Compute Statistics for Specific Datasets

stats = analyzer.compute_statistics(
    column='latency_ms',
    datasets=['experiment1', 'experiment2']
)

Adding Throughput Column

Convert latency measurements to throughput (tokens/second):

analyzer.add_throughput_column(
    latency_column='latency_ms',
    batch_size=8,
    seq_len=128,
    throughput_column='throughput'
)

Formula: throughput = (batch_size * seq_len) / (latency_ms / 1000)

Visualization

Distribution Plots

Histogram and KDE (Kernel Density Estimation)

analyzer.plot_distributions(
    column='latency_ms',
    plot_type='both',  # Options: 'hist', 'kde', 'both'
    bins=50
)

Histogram Only

analyzer.plot_distributions(
    column='latency_ms',
    plot_type='hist',
    bins=30
)

Box Plots

Compare distributions across datasets:

analyzer.plot_boxplot(column='latency_ms')

Metric Comparison Bar Charts

Compare multiple metrics for a single dataset:

analyzer.plot_comparison(
    dataset='experiment1',
    column='latency_ms',
    metrics=['mean', 'median', 'p95', 'p99', 'max']
)

Export Summary to CSV

analyzer.export_summary(
    column='latency_ms',
    output_file='results/summary.csv',
    percentiles=[50, 90, 95, 99]
)

Multi-Tenant Analysis

Computing Total Throughput

For multi-tenant scenarios where multiple models share one GPU:

total_throughput_df = compute_total_throughput(
    analyzer,
    latency_column='latency_ms',
    batch_size=8,
    seq_len=128,
    datasets=None  # None = all datasets
)

This function:

Computes throughput for each model
Sums throughputs across all models per iteration
Returns a DataFrame with total throughput statistics

Comparing Multi-Tenant vs Single-Tenant

# Load multi-tenant data
multi_analyzer = DataAnalyzer()
multi_analyzer.load_pattern('../data/3_distilgpt2_mps/**/events.jsonl',
                             label_extractor=lambda name: name.parent.name)

# Load single-tenant baseline data
single_analyzer = DataAnalyzer()
single_analyzer.load_files('../data/solo_distilgpt2_b8_128/distilgpt2/events.jsonl',
                            dataset_name='solo_distilgpt2')

# Compare
comparison = compare_multi_vs_single_tenant(
    multi_tenant_analyzer=multi_analyzer,
    single_tenant_analyzer=single_analyzer,
    batch_size=8,
    seq_len=128
)

Returns:

Mean throughput for multi-tenant and single-tenant
Improvement percentage
Visualizations comparing distributions and means

Fairness & Statistical Significance

Fairness Metrics

Analyze whether GPU resources are shared fairly among models:

fairness = compute_fairness_metrics(
    analyzer,
    column='latency_ms',
    datasets=None  # None = all datasets
)

Key Metrics:

Coefficient of Variation (CV): std/mean
- < 0.05: Excellent fairness
- < 0.10: Good fairness
- < 0.20: Moderate fairness
- ≥ 0.20: Poor fairness
Gini Coefficient: 0 (perfect equality) to 1 (perfect inequality)
Max/Min Ratio: How many times slower is the slowest vs fastest model
Max Deviation %: Worst-case deviation from average

Statistical Significance Testing

Test whether performance differences are statistically significant:

sig_results = test_statistical_significance(
    analyzer,
    column='latency_ms',
    datasets=None,
    alpha=0.05  # Significance level
)

Tests Performed:

Levene’s Test: Tests for equal variances (homoscedasticity)
One-way ANOVA (or Kruskal-Wallis if variances unequal): Tests if any groups differ
Pairwise t-tests with Bonferroni correction: Identifies which specific pairs differ

Returns:

Test statistics and p-values
Significant pairs of models
Effect sizes (Cohen’s d)

Combined Visualization

visualize_fairness_and_significance(
    analyzer,
    column='latency_ms',
    datasets=None,
    fairness_metrics=None,  # Auto-computed if None
    sig_results=None,       # Auto-computed if None
    figsize=(16, 10)
)

Creates a comprehensive 6-panel visualization showing:

Mean ± standard deviation comparison
Box plot distributions
Within-model variability (coefficient of variation)
Fairness metrics summary
Pairwise p-values heatmap
Statistical test summary

Resource Hogging Detection

Detect if one model is monopolizing GPU resources at the expense of others.

Normalized (Baseline-Aware) Detection

This method accounts for different model architectures having different baseline speeds:

hogging = detect_resource_hogging_normalized(
    analyzer=multi_analyzer,        # Multi-tenant data
    baseline_analyzer=single_analyzer,  # Solo baseline data
    column='latency_ms',
    hogging_threshold=0.15  # 15% difference threshold
)

How It Works:

Compares each model’s slowdown from its solo baseline
Detects models with abnormal slowdown patterns
Identifies both hogging (getting priority) and starvation (being starved)