All Knowledge

The Giskard hub

LLM Observability vs LLM Evaluation: Building Comprehensive Enterprise AI Testing Strategies

Enterprise AI teams often treat observability and evaluation as competing priorities, leading to gaps in either technical monitoring or quality assurance.

David Berenstein

View post

Blog

Real-Time Guardrails vs Batch LLM Evaluations: A Comprehensive AI Testing Strategy

Enterprise AI teams need both immediate protection and deep quality insights but often treat guardrails and batch evaluations as competing priorities.

David Berenstein

View post

Understanding Hallucination and Misinformation in LLMs

Blog

A Practical Guide to LLM Hallucinations and Misinformation Detection

Explore how false content is generated by AI and why it's critical to understand LLM vulnerabilities for safer, more ethical AI use.

David Berenstein

View post

Illustration of AI vulnerabilities and risk mitigation in Large Language Models (LLMs) for secure and responsible deployment.

Blog

A Practical Guide on AI Security and LLM Vulnerabilities

Discover the key vulnerabilities in Large Language Models (LLMs) and learn how to mitigate AI risks with clear overviews and practical examples. Stay ahead in safe and responsible AI deployment.

David Berenstein

View post

Blog

Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Testing AI agents presents significant challenges as vulnerabilities continuously emerge, exposing organizations to reputational and financial risks when systems fail in production. Giskard's LLM Evaluation Hub addresses these challenges through adversarial LLM agents that automate exhaustive testing, annotation tools that integrate domain expertise, and continuous red teaming that adapts to evolving threats.

Blanca Rivera Campos

View post

Blog

LLMOps: MLOps for Large Language Models

This article explores LLMOps, detailing its challenges and best practices for managing Large Language Models (LLMs) in production. It compared LLMOps with traditional MLOps, covering hardware needs, performance metrics, and handling non-deterministic outputs. The guide outlines steps for deploying LLMs, including model selection, fine-tuning, and continuous monitoring, while emphasizing quality and security management.

Najia Gul

View post

Blog

Defending LLMs against Jailbreaking: Definition, examples and prevention

Jailbreaking refers to maliciously manipulating Large Language Models (LLMs) to bypass their ethical constraints and produce unauthorized outputs. This emerging threat arises from combining the models' high adaptability with inherent vulnerabilities that attackers can exploit through techniques like prompt injection. Mitigating jailbreaking risks requires a holistic approach involving robust security measures, adversarial testing, red teaming, and ongoing vigilance to safeguard the integrity and reliability of AI systems.

Matteo A. D'Alessandro

View post

Blog

Data Poisoning attacks on Enterprise LLM applications: AI risks, detection, and prevention

Data poisoning is a real threat to enterprise AI systems like Large Language Models (LLMs), where malicious data tampering can skew outputs and decision-making processes unnoticed. This article explores the mechanics of data poisoning attacks, real-world examples across industries, and best practices to mitigate risks through red teaming, and automated evaluation tools.

Matteo A. D'Alessandro

View post

Blog

Guide to LLM evaluation and its critical impact for businesses

As businesses increasingly integrate LLMs into several applications, ensuring the reliability of AI systems is key. LLMs can generate biased, inaccurate, or even harmful outputs if not properly evaluated. This article explains the importance of LLM evaluation, and how to do it (methods and tools). It also present Giskard's comprehensive solutions for evaluating LLMs, combining automated testing, customizable test cases, and human-in-the-loop.

Blanca Rivera Campos

View post

Blog

AI Safety at DEFCON 31: Red Teaming for Large Language Models (LLMs)

DEFCON, one of the world's premier hacker conventions, this year saw a unique focus at the AI Village: red teaming of Large Language Models (LLMs). Instead of conventional hacking, participants were challenged to use words to uncover AI vulnerabilities. The Giskard team was fortunate to attend, witnessing firsthand the event's emphasis on understanding and addressing potential AI risks.

Blanca Rivera Campos

View post

Blog

FOSDEM 2023: Presentation on CI/CD for ML and How to test ML models?

In this talk, we explain why testing ML models is an important and difficult problem. Then we explain, using concrete examples, how Giskard helps ML Engineers deploy their AI systems into production safely by (1) designing fairness & robustness tests and (2) integrating them in a CI/CD pipeline.

Alex Combessie

View post

Blog

Where do biases in ML come from? #7 📚 Presentation

We explain presentation bias, a negative effect present in almost all ML systems with User Interfaces (UI)

The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests

Alex Combessie

View post