Skip to content

Quick Start

Get started with shap-monitor in just a few minutes.

Basic Example

Here's a minimal example using a RandomForest classifier:

from shapmonitor import SHAPMonitor
import shap
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Train a simple model
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
model = RandomForestClassifier(random_state=42)
model.fit(X, y)

# Create SHAP explainer
explainer = shap.TreeExplainer(model)

# Initialize the monitor
monitor = SHAPMonitor(
    explainer=explainer,
    data_dir="./shap_logs",
    sample_rate=0.1,  # Log 10% of predictions
    model_version="v1.0",
    feature_names=[f"feature_{{i}}" for i in range(10)]
)

# Log a batch of predictions
monitor.log_batch(X[:100])

What Just Happened?

  1. Created a SHAP explainer - TreeExplainer for tree-based models
  2. Initialized SHAPMonitor - Configured to log 10% of predictions to ./shap_logs
  3. Logged explanations - Computed and stored SHAP values for 100 predictions

The monitor automatically:

  • Samples predictions based on sample_rate (10% in this example)
  • Computes SHAP values using the explainer
  • Stores explanations to Parquet files organized by date

Computing SHAP Values Directly

You can also compute SHAP values without logging:

# Compute SHAP values for analysis
explanation = monitor.compute(X[:10])

# Access the values
shap_values = explanation.values  # SHAP values
base_values = explanation.base_values  # Base values (expected value)

Analyzing Logged Data

After logging data, you can analyze it using SHAPAnalyzer:

from datetime import datetime, timedelta
from shapmonitor.analysis import SHAPAnalyzer
from shapmonitor.backends import ParquetBackend

# Create backend and analyzer
backend = ParquetBackend("./shap_logs")
analyzer = SHAPAnalyzer(backend)

# Get summary statistics for the past 7 days
today = datetime.now()
week_ago = today - timedelta(days=7)
summary = analyzer.summary(week_ago, today)

print(summary)
# Output shows feature importance (mean_abs), direction (mean), and variability (std)

Comparing Time Periods

Detect changes in feature importance over time:

# Compare last week vs this week
last_week = today - timedelta(days=14)
two_weeks_ago = today - timedelta(days=21)

comparison = analyzer.compare_time_periods(
    (two_weeks_ago, last_week),  # reference period
    (last_week, today),          # current period
)

print(comparison)
# Shows PSI (drift), feature importance changes, and ranking shifts

Complete Working Example

See the LightGBM example for a complete working example with the California Housing dataset.

Next Steps