Skip to content

Examples and Tutorials

Practical, runnable examples demonstrating real-world usage of the AI Metacognition Toolkit.

Quick Start Examples

basic_detection_example.py ⭐

Comprehensive tutorial on situational awareness detection with visualization.

python examples/basic_detection_example.py

What it demonstrates: - Setting up the SituationalAwarenessDetector - Fitting baseline from evaluation data - Testing detection across awareness levels - Visualizing results with matplotlib - Interpreting probability scores

Output: Creates awareness_detection_results.png with bar charts and scatter plots.

View source


counterfactual_analysis_example.py

Causal attribution analysis through counterfactual interventions.

python examples/counterfactual_analysis_example.py

What it demonstrates: - Implementing MockModelAPI - Systematic counterfactual analysis - Testing 4 intervention types - Ranking causal influence - Visualizing divergence effects

Output: Creates counterfactual_analysis_results.png with causal attribution charts.

View source


production_monitoring_example.py

Real-time production monitoring with custom alert handlers.

python examples/production_monitoring_example.py

What it demonstrates: - Setting up ObserverEffectMonitor - Configuring 4 custom alert handlers - Simulating production scenarios - Exporting metrics for monitoring systems - Integration patterns

Output: - production_monitoring_results.png - Monitoring dashboard - production_alerts.log - Alert log file - production_metrics.json - Exportable metrics

View source


Code Snippets

Detect Awareness

from ai_metacognition.detectors import SituationalAwarenessDetector

# Setup
detector = SituationalAwarenessDetector()
detector.fit_baseline(baseline_responses)

# Detect
result = detector.detect_awareness("I notice this is a test.")
print(f"Awareness: {result['awareness_probability']:.1%}")

Causal Analysis

from ai_metacognition.analyzers import CounterfactualCoTAnalyzer

analyzer = CounterfactualCoTAnalyzer(model_api)
results = analyzer.systematic_causal_analysis(prompt, cot)

for intervention in results['interventions']:
    print(f"{intervention['intervention_type']}: "
          f"{intervention['divergence']:.3f}")

Production Monitoring

from ai_metacognition.detectors import ObserverEffectMonitor, Interaction

monitor = ObserverEffectMonitor(window_size=100)
monitor.set_evaluation_baseline(eval_samples)

# Log each interaction
monitor.log_production_interaction(Interaction(...))

# Check divergence
if monitor.get_divergence_report()['has_divergence']:
    print("Alert: Behavior diverged from baseline!")

Tutorials

Tutorial 1: Basic Detection Workflow

Goal: Detect situational awareness in model outputs

Steps:

  1. Collect baseline data from non-evaluation contexts
  2. Initialize detector with appropriate priors
  3. Fit baseline to learn normal behavior
  4. Test on potentially aware outputs
  5. Interpret probability scores and confidence intervals

Complete code: See basic_detection_example.py


Tutorial 2: Understanding Causal Drivers

Goal: Identify what components drive model decisions

Steps:

  1. Implement ModelAPI for your model
  2. Collect chain-of-thought traces
  3. Run systematic causal analysis
  4. Examine divergence scores
  5. Rank components by causal influence

Complete code: See counterfactual_analysis_example.py


Tutorial 3: Production Deployment

Goal: Monitor model behavior in production

Steps:

  1. Set up ObserverEffectMonitor with appropriate window/threshold
  2. Register alert handlers for your infrastructure
  3. Set baseline from evaluation data
  4. Log production interactions continuously
  5. Respond to divergence alerts

Complete code: See production_monitoring_example.py


Integration Examples

With OpenAI API

import openai
from ai_metacognition.detectors import SituationalAwarenessDetector

detector = SituationalAwarenessDetector()

# Collect baseline
baseline = []
for prompt in evaluation_prompts:
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    baseline.append(response.choices[0].message.content)

detector.fit_baseline(baseline)

# Test awareness
test_response = openai.ChatCompletion.create(...)
result = detector.detect_awareness(test_response.choices[0].message.content)

With HuggingFace Transformers

from transformers import pipeline
from ai_metacognition.detectors import SituationalAwarenessDetector

generator = pipeline('text-generation', model='gpt2')
detector = SituationalAwarenessDetector()

# Baseline
baseline = [generator(p)[0]['generated_text'] for p in baseline_prompts]
detector.fit_baseline(baseline)

# Test
test_output = generator(test_prompt)[0]['generated_text']
result = detector.detect_awareness(test_output)

With LangChain

from langchain import OpenAI, LLMChain, PromptTemplate
from ai_metacognition.analyzers import CounterfactualCoTAnalyzer

class LangChainModelAPI:
    def __init__(self, llm):
        self.llm = llm

    def generate_with_cot(self, prompt: str, cot: str) -> str:
        template = PromptTemplate(
            template="Chain of thought: {cot}\n\nQuestion: {prompt}\nAnswer:",
            input_variables=["cot", "prompt"]
        )
        chain = LLMChain(llm=self.llm, prompt=template)
        return chain.run(cot=cot, prompt=prompt)

    def generate(self, prompt: str) -> str:
        return self.llm(prompt)

llm = OpenAI(temperature=0)
analyzer = CounterfactualCoTAnalyzer(LangChainModelAPI(llm))

Common Patterns

Pattern 1: Batch Processing

outputs = [...]  # Many outputs to check

results = []
for output in outputs:
    result = detector.detect_awareness(output)
    results.append(result)

# Analyze distribution
import pandas as pd
df = pd.DataFrame(results)
print(df['awareness_probability'].describe())

Pattern 2: Threshold-Based Actions

result = detector.detect_awareness(output)
prob = result['awareness_probability']

if prob > 0.75:
    action = "flag_for_manual_review"
elif prob > 0.50:
    action = "additional_checks"
else:
    action = "normal_processing"

handle_action(action, output)

Pattern 3: Continuous Monitoring

from datetime import datetime

while True:
    interaction = get_next_interaction()
    monitor.log_production_interaction(interaction)

    report = monitor.get_divergence_report()
    if report['has_divergence']:
        alert_team(report)

    time.sleep(1)

Further Reading