G

All Knowledge

Articles, tutorials & news on AI Quality, Security & Compliance

Recent content

Increasing trust in foundation language models through multi-lingual security, safety and robustness testing
News

Giskard announces Phare, a new open & multi-lingual LLM Benchmark

During the Paris AI Summit, Giskard launches Phare, a new open & independent LLM benchmark to evaluate key AI security dimensions including hallucination, factual accuracy, bias, and potential for harm across several languages, with Google DeepMind as research partner. This initiative is meant to provide open measurements to assess trustworthiness of Generative AI models in real applications.

Matteo Dora - Machine Learning Researcher
Matteo Dora
View post
DeepSeek R1 analysis
News

DeepSeek R1: Complete analysis of capabilities and limitations

In this article, we provide a detailed analysis of DeepSeek R1, comparing its performance against leading AI models like GPT-4o and O1. Our testing reveals both impressive knowledge capabilities and significant concerns, particularly regarding the model's tendency to generate hallucinations. Through concrete examples, we examine how R1 handles politically sensitive topics.

Matteo Dora - Machine Learning Researcher
Matteo Dora
View post
Giskard integrates with LiteLLM to simplify LLM agent testing
News

[Release notes] Giskard integrates with LiteLLM: Simplifying LLM agent testing across foundation models

Giskard's integration with LiteLLM enables developers to test their LLM agents across multiple foundation models. The integration enhances Giskard's core features - LLM Scan for vulnerability assessment and RAGET for RAG evaluation - by allowing them to work with any supported LLM provider: whether you're using major cloud providers like OpenAI and Anthropic, local deployments through Ollama, or open-source models like Mistral.

Blanca Rivera Campos
Blanca Rivera Campos
View post