The Giskard hub
This article explores LLMOps, detailing its challenges and best practices for managing Large Language Models (LLMs) in production. It compared LLMOps with traditional MLOps, covering hardware needs, performance metrics, and handling non-deterministic outputs. The guide outlines steps for deploying LLMs, including model selection, fine-tuning, and continuous monitoring, while emphasizing quality and security management.
Jailbreaking refers to maliciously manipulating Large Language Models (LLMs) to bypass their ethical constraints and produce unauthorized outputs. This emerging threat arises from combining the models' high adaptability with inherent vulnerabilities that attackers can exploit through techniques like prompt injection. Mitigating jailbreaking risks requires a holistic approach involving robust security measures, adversarial testing, red teaming, and ongoing vigilance to safeguard the integrity and reliability of AI systems.
Data poisoning is a real threat to enterprise AI systems like Large Language Models (LLMs), where malicious data tampering can skew outputs and decision-making processes unnoticed. This article explores the mechanics of data poisoning attacks, real-world examples across industries, and best practices to mitigate risks through red teaming, and automated evaluation tools.
As businesses increasingly integrate LLMs into several applications, ensuring the reliability of AI systems is key. LLMs can generate biased, inaccurate, or even harmful outputs if not properly evaluated. This article explains the importance of LLM evaluation, and how to do it (methods and tools). It also present Giskard's comprehensive solutions for evaluating LLMs, combining automated testing, customizable test cases, and human-in-the-loop.
DEFCON, one of the world's premier hacker conventions, this year saw a unique focus at the AI Village: red teaming of Large Language Models (LLMs). Instead of conventional hacking, participants were challenged to use words to uncover AI vulnerabilities. The Giskard team was fortunate to attend, witnessing firsthand the event's emphasis on understanding and addressing potential AI risks.
In this talk, we explain why testing ML models is an important and difficult problem. Then we explain, using concrete examples, how Giskard helps ML Engineers deploy their AI systems into production safely by (1) designing fairness & robustness tests and (2) integrating them in a CI/CD pipeline.
We explain presentation bias, a negative effect present in almost all ML systems with User Interfaces (UI)
Emergent biases result from the use of AI / ML across unanticipated contexts. It introduces risk when the context shifts.
Social, political, economic, and post-colonial asymmetries introduce risk to AI / ML development
Selection bias happens when your data is not representative of the situation to analyze, introducing risk to AI / ML systems
Machine Learning systems are particularly sensitive to measurement bias. Calibrate your AI / ML models to avoid that risk.
What happens when your AI / ML model is missing important variables? The risks of endogenous and exogenous exclusion bias.
Research Literature review: A Survey on Bias and Fairness in Machine Learning
Understand why Quality Assurance for AI is the need of the hour. Gain competitive advantage from your technological investments in ML systems.
We look into the latest research to understand what is the future of AI / ML Testing
Monitoring is just a tool: necessary but not sufficient. You need people committed to AI maintenance, processes & tools in case things break down.
Biases in AI / ML algorithms are avoidable. Regulation will push companies to invest in mitigation strategies.
Find out more about Giskard founders story
Technological innovation such as AI / ML comes with risks. Giskard aims to reduce it.
Giskard supports quality standards for AI / ML models. Now is the time to adopt them!
AI used in recommender systems is posing a serious issue for the media industry and our society
It is difficult to create interfaces to AI models Even AIs made by tech giants have bugs. With Giskard AI, we want to make it easy to create interfaces for humans to inspect AI models. 🕵️ Do you think interfaces are valuable? If so, what kinds of interfaces do you like?
The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests