Accuracy scores and leaderboard rankings only tell a small part of the story. In real environments, LLMs can be influenced, manipulated, and pushed into failure modes that standard evaluation simply doesn't capture. This session focuses on how LLMs behave once deployed, especially from a risk and security perspective. Drawing on practical examples such as prompt injection, RAG poisoning, and everyday failure patterns, we'll explore where current evaluation approaches fall short and what organisations should be testing instead. Rather than asking "is the model accurate?", the session reframes the question to: "how does the model behave under pressure, and where does it break?"
• Understanding how LLMs can be influenced and where real risks emerge.
• Identifying failure modes such as prompt injection, RAG poisoning, and hidden vulnerabilities.
• Rethinking evaluation to reflect real usage, adversarial conditions, and enterprise risk.
As organisations rapidly adopt advanced AI tools, many are discovering that successful implementation hinges not only on engineering skill but on bringing non technical teams along with the journey. This panel explores practical strategies for building confidence, competence, and collaboration across business functions that may lack technical backgrounds. Panellists will discuss what works, where teams commonly struggle, and how companies are creating a shared baseline of AI fluency that enables responsible and effective adoption. The session will highlight cultural, operational, and learning approaches that make AI accessible, and actionable, for everyone.
Check out the incredible speaker line-up to see who will be joining Shay.
Download The Latest Agenda