G

All Knowledge

Articles, tutorials & news on AI Quality, Security & Compliance

Recent content

Testing LLM Agents through continuous Red Teaming
Tutorials

How to implement LLM as a Judge to test AI Agents? (Part 2)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this second part of our tutorial, we'll explore how to automate test execution and implement continuous red teaming for LLM agents. Learn to systematically evaluate agentic AI systems, interpret results, and maintain security through ongoing testing as your AI application evolves.

Jean-Marie John-Mathews
Jean-Marie John-Mathews, Ph.D.
View post
Implementing LLM as a Judge to test AI agents
Tutorials

How to implement LLM as a Judge to test AI Agents? (Part 1)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this first part of our tutorial, we introduce a systematic approach using LLM as a judge to detect hallucinations and security vulnerabilities before deployment. Learn how to generate synthetic test data and implement business annotation processes for exhaustive AI agent testing.

Jean-Marie John-Mathews
Jean-Marie John-Mathews, Ph.D.
View post
Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming
Blog

Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Testing AI agents presents significant challenges as vulnerabilities continuously emerge, exposing organizations to reputational and financial risks when systems fail in production. Giskard's LLM Evaluation Hub addresses these challenges through adversarial LLM agents that automate exhaustive testing, annotation tools that integrate domain expertise, and continuous red teaming that adapts to evolving threats.

Blanca Rivera Campos
Blanca Rivera Campos
View post