Evaluating Behaviour and Risk in AI Systems.

Behavioural studies, evaluation methods, and strategic thinking on AI quality, for engineers and testers building trustworthy AI.

Independent analysis of AI quality.

Behaviour

How LLMs actually behave, biases, drift, and emergent patterns.

Methods

Hands-on techniques for evaluating AI systems. Confidence scoring, semantic measurements, root cause analysis, language heuristics.

Security

Adversarial testing, prompt injection, and safety evaluation.

Strategy

Operational and conceptual thinking on AI quality. Observability, risk, cost, and the principles behind reliable evaluation.

Practical thinking on AI quality and strategy.

I write about benchmarks, red-teaming, evaluation frameworks, behavioural testing, production monitoring, and the messy reality in between. No single method solves AI quality, I'm here to map the landscape honestly.