Accuracy scores and leaderboard rankings only tell a small part of the story. In real environments, LLMs can be influenced, manipulated, and pushed into failure modes that standard evaluation simply doesn't capture. This session focuses on how LLMs behave once deployed, especially from a risk and security perspective. Drawing on practical examples such as prompt injection, RAG poisoning, and everyday failure patterns, we'll explore where current evaluation approaches fall short and what organisations should be testing instead. Rather than asking "is the model accurate?", the session reframes the question to: "how does the model behave under pressure, and where does it break?"
• Understanding how LLMs can be influenced and where real risks emerge.
• Identifying failure modes such as prompt injection, RAG poisoning, and hidden vulnerabilities.
• Rethinking evaluation to reflect real usage, adversarial conditions, and enterprise risk.
As organisations rapidly adopt advanced AI tools, success depends on how well they are applied across different business functions, not just how they are built. This panel explores practical strategies for driving value in context. Panellists will share what works, where functions struggle, and how organisations are building shared AI fluency to enable effective and responsible use. The session highlights the cultural, operational, and learning approaches that make AI relevant, usable, and impactful across the business.
Check out the incredible speaker line-up to see who will be joining Shay.
Download The Latest Agenda