Examples and Tutorials¶
Practical, runnable examples demonstrating real-world usage of the AI Metacognition Toolkit.
Quick Start Examples¶
basic_detection_example.py ⭐¶
Comprehensive tutorial on situational awareness detection with visualization.
What it demonstrates: - Setting up the SituationalAwarenessDetector - Fitting baseline from evaluation data - Testing detection across awareness levels - Visualizing results with matplotlib - Interpreting probability scores
Output: Creates awareness_detection_results.png with bar charts and scatter plots.
counterfactual_analysis_example.py¶
Causal attribution analysis through counterfactual interventions.
What it demonstrates: - Implementing MockModelAPI - Systematic counterfactual analysis - Testing 4 intervention types - Ranking causal influence - Visualizing divergence effects
Output: Creates counterfactual_analysis_results.png with causal attribution charts.
production_monitoring_example.py¶
Real-time production monitoring with custom alert handlers.
What it demonstrates: - Setting up ObserverEffectMonitor - Configuring 4 custom alert handlers - Simulating production scenarios - Exporting metrics for monitoring systems - Integration patterns
Output:
- production_monitoring_results.png - Monitoring dashboard
- production_alerts.log - Alert log file
- production_metrics.json - Exportable metrics
Code Snippets¶
Detect Awareness¶
from ai_metacognition.detectors import SituationalAwarenessDetector
# Setup
detector = SituationalAwarenessDetector()
detector.fit_baseline(baseline_responses)
# Detect
result = detector.detect_awareness("I notice this is a test.")
print(f"Awareness: {result['awareness_probability']:.1%}")
Causal Analysis¶
from ai_metacognition.analyzers import CounterfactualCoTAnalyzer
analyzer = CounterfactualCoTAnalyzer(model_api)
results = analyzer.systematic_causal_analysis(prompt, cot)
for intervention in results['interventions']:
print(f"{intervention['intervention_type']}: "
f"{intervention['divergence']:.3f}")
Production Monitoring¶
from ai_metacognition.detectors import ObserverEffectMonitor, Interaction
monitor = ObserverEffectMonitor(window_size=100)
monitor.set_evaluation_baseline(eval_samples)
# Log each interaction
monitor.log_production_interaction(Interaction(...))
# Check divergence
if monitor.get_divergence_report()['has_divergence']:
print("Alert: Behavior diverged from baseline!")
Tutorials¶
Tutorial 1: Basic Detection Workflow¶
Goal: Detect situational awareness in model outputs
Steps:
- Collect baseline data from non-evaluation contexts
- Initialize detector with appropriate priors
- Fit baseline to learn normal behavior
- Test on potentially aware outputs
- Interpret probability scores and confidence intervals
Complete code: See basic_detection_example.py
Tutorial 2: Understanding Causal Drivers¶
Goal: Identify what components drive model decisions
Steps:
- Implement ModelAPI for your model
- Collect chain-of-thought traces
- Run systematic causal analysis
- Examine divergence scores
- Rank components by causal influence
Complete code: See counterfactual_analysis_example.py
Tutorial 3: Production Deployment¶
Goal: Monitor model behavior in production
Steps:
- Set up ObserverEffectMonitor with appropriate window/threshold
- Register alert handlers for your infrastructure
- Set baseline from evaluation data
- Log production interactions continuously
- Respond to divergence alerts
Complete code: See production_monitoring_example.py
Integration Examples¶
With OpenAI API¶
import openai
from ai_metacognition.detectors import SituationalAwarenessDetector
detector = SituationalAwarenessDetector()
# Collect baseline
baseline = []
for prompt in evaluation_prompts:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
baseline.append(response.choices[0].message.content)
detector.fit_baseline(baseline)
# Test awareness
test_response = openai.ChatCompletion.create(...)
result = detector.detect_awareness(test_response.choices[0].message.content)
With HuggingFace Transformers¶
from transformers import pipeline
from ai_metacognition.detectors import SituationalAwarenessDetector
generator = pipeline('text-generation', model='gpt2')
detector = SituationalAwarenessDetector()
# Baseline
baseline = [generator(p)[0]['generated_text'] for p in baseline_prompts]
detector.fit_baseline(baseline)
# Test
test_output = generator(test_prompt)[0]['generated_text']
result = detector.detect_awareness(test_output)
With LangChain¶
from langchain import OpenAI, LLMChain, PromptTemplate
from ai_metacognition.analyzers import CounterfactualCoTAnalyzer
class LangChainModelAPI:
def __init__(self, llm):
self.llm = llm
def generate_with_cot(self, prompt: str, cot: str) -> str:
template = PromptTemplate(
template="Chain of thought: {cot}\n\nQuestion: {prompt}\nAnswer:",
input_variables=["cot", "prompt"]
)
chain = LLMChain(llm=self.llm, prompt=template)
return chain.run(cot=cot, prompt=prompt)
def generate(self, prompt: str) -> str:
return self.llm(prompt)
llm = OpenAI(temperature=0)
analyzer = CounterfactualCoTAnalyzer(LangChainModelAPI(llm))
Common Patterns¶
Pattern 1: Batch Processing¶
outputs = [...] # Many outputs to check
results = []
for output in outputs:
result = detector.detect_awareness(output)
results.append(result)
# Analyze distribution
import pandas as pd
df = pd.DataFrame(results)
print(df['awareness_probability'].describe())
Pattern 2: Threshold-Based Actions¶
result = detector.detect_awareness(output)
prob = result['awareness_probability']
if prob > 0.75:
action = "flag_for_manual_review"
elif prob > 0.50:
action = "additional_checks"
else:
action = "normal_processing"
handle_action(action, output)
Pattern 3: Continuous Monitoring¶
from datetime import datetime
while True:
interaction = get_next_interaction()
monitor.log_production_interaction(interaction)
report = monitor.get_divergence_report()
if report['has_divergence']:
alert_team(report)
time.sleep(1)