Giskard Vision: Enhance Computer Vision models for classification, object & landmark detection

We’re pleased to announce the release of giskard-vision, the latest module in our open-source AI Testing library, designed specifically for computer vision tasks. The Giskard Open-Source library is designed to assess the reliability and safety of machine learning models, and identify critical issues such as biases, hallucinations, toxicity, robustness, and misinformation across different AI model types.

After its success for NLP, LLMs and tabular models, we’re now expanding it to computer vision models to analyze their robustness, helping data scientists spot weaknesses, evaluate performance, and ensure your AI models are ethically sound in real-world applications like face landmark detection or medical image classification.

In this guide, we’ll walk you through how to use giskard-vision to test a Computer Vision model for cancer detection by running a complete scan.

🎯 Giskard-Vision: Key goals and principles for Computer Vision AI

Why Giskard-Vision?

Even the most advanced vision models can have hidden flaws that impact their performance and fairness in the real world. These flaws can take the form of performance issues, biases, inconsistencies, or ethical issues that are tough to catch with standard metrics.

For example, a vision model might struggle with images exhibiting specific attributes like contrast, color, or brightness. It may also underperform on sensitive subsets of data, such as roads from specific regions in autonomous driving or medical images of particular groups of people, like elderly or young individuals in healthcare applications.

Identifying these weaknesses is crucial for taking countermeasures to ensure the model's predictions remain safe, accurate, and reliable in all circumstances. giskard-vision offers an automated and systematic way to spot these problems early on, helping you fine-tune your models and avoid costly challenges in production.

Key features: Enhancing Computer Vision models' reliability

giskard-vision is designed to automatically detect and report:

Performance Degradation: Identifies specific conditions or subsets of data where your model's performance may drop, allowing targeted improvements. As an example, this could take the form of bad performance under certain weather conditions.
Fairness and Bias Detection: Scans your model for biases linked to sensitive attributes, ensuring that your model’s decisions are fair across all groups, and highlights potential ethical issues that could arise from model predictions, guiding you toward responsible AI practices. As an example, this could take the form of bad performance on elderly people.

How Giskard-Vision works

Giskard-Vision’s functionality consists in a three-step process that seamlessly integrates with your existing ML pipeline:

Wrap Your Dataset: To begin, you wrap your dataset using custom data iterators. This step allows giskard-vision to access your images, labels, and any associated metadata.
Wrap Your Model: Next, wrap your model to enable giskard-vision to perform scans using its predictions. This involves defining a simple interface that the library can use to interpret your model’s outputs.
Scan Your Model: Finally, run the automated scan, which analyzes your model’s behavior across various data points. The scan produces a comprehensive report that highlights vulnerabilities, biases, and performance issues, offering you a clear path toward improving your model.

After scanning your model, giskard-vision generates a detailed report with metrics. These results are presented in easy-to-understand visualizations that guide you through the areas that need attention, making the process of refining your vision models both efficient and effective.

Supported Computer Vision tasks: From object recognition to healthcare applications

giskard-vision is designed to support a wide range of vision tasks, making it a versatile tool for any computer vision model. Whether you are working with image classification, object detection, or landmark detection, giskard-vision provides tailored scanning capabilities to ensure your model meets the highest standards of performance, fairness, and ethical responsibility. Below are the main types of tasks that giskard-vision can help you with:

Image Classification: For models that classify images into predefined categories, such as identifying types of animals, detecting medical conditions from scans, or sorting products in an inventory. giskard-vision can scan your model for biases that affect specific classes, identify performance issues across different image conditions, and provide insights into potential ethical concerns tied to model predictions. For now, the scan works on single label classification.
Object Detection: Used for tasks where models must identify and locate objects within an image, such as detecting cars and pedestrians in autonomous driving, defects on a production line, or faces on camera images. giskard-vision ensures your object detection models perform reliably across diverse scenes, detects biases that might cause uneven detection rates, and flags areas where model predictions could raise ethical questions. For now, the scan works on single object detection.

Landmark Detection: Ideal for applications requiring precise localization of specific points in images, such as facial landmark detection for AR applications, keypoint detection for pose estimation, or identifying specific anatomical points in medical imaging. giskard-vision helps ensure that these models perform consistently, even when dealing with varying lighting, angles, and demographic variations, while also scanning for any potential ethical implications. For now, the scan works on single face detection.

giskard-vision offers the flexibility to adapt its scanning procedures to each of these tasks, providing insights that are crucial for fine-tuning and improving your vision models.

Real-world Computer Vision examples

Use-case	Model	Dataset
Skin cancer detection	Hugging Face skin cancer classification model	Hugging Face skin cancer dataset

In this example, we load the demo wrapper for a Hugging Face skin cancer detection model and the demo dataloader for the Hugging Face skin cancer image classification dataset.

Giskard’s scan allows you to detect vulnerabilities in your model automatically. On image classification, these include performance biases, unrobustness and ethical issues.

The report is presented in an HTML tab that organizes different groups of issues and provides detailed information about each problematic slice identified by the scan algorithm. You can also view sample images that correspond to each identified slice.

In this example, the model underperforms on groups of elderly people when it comes to detecting skin cancer, which is an ethical issue for a medical algorithm.

📚 Tutorial: Applying Giskard-Vision to your Computer Vision models

Ensure that you have both the base and vision libraries of Giskard installed:

Step 1: Wrapping your Computer Vision datasets

To scan your model, the first step is to wrap your dataset using Giskard's DataIteratorBase class. This will allow you to define how data is loaded and labeled, setting up the structure needed for Giskard to understand your dataset.

For image classification tasks, your custom dataloader should extend DataIteratorBase and override key methods, such as loading images, fetching labels and fetching metadata. Here's a sample implementation:

Step 2: Wrapping your model

Once your dataset is wrapped, the next step is to wrap your model using Giskard's ModelBase class. This wrapper defines how your model interacts with the dataset during the scan.

Step 3: Scanning your Computer Vision system

With your dataset and model wrapped, you can now perform a scan to identify vulnerabilities. Below is an example using a demo dataloader and an OpenCV model. Substitute these with your custom dataloader and model wrapper as needed.

After completing these steps, you'll receive an HTML report detailing various types of issues based on the dataset's metadata. In this example, the report highlights:

Performance issues: where the model underperforms on non-sensitive metadata subgroups.
Ethical issues: related to sensitive subgroups where the model shows poor performance.
Robustness issues: where the model struggles when images are degraded (e.g., blur, noise).
Attributes issues: where the model performs poorly on images with specific physical traits, such as contrast, brightness, or color.

This report helps you identify your model's flaws and weaknesses, enabling you to fine-tune it or address specific subgroups differently to boost overall performance and make it reliable on all your use cases.

Future perspectives

In this article we explored how to use the new giskard-vision scan to detect flaws in computer vision models. Now, you’ll be able to wrap a vision model, a dataset and run the scan on tasks like image classification, object detection and landmark detection to ensure your model is reliable, fair, and robust.

In the future, we plan to integrate tools for automatic slice detection based solely on images, eliminating the need to manually input metadata, and expanding the potential applications of the scan. We'll be also expanding its capabilities to include monitoring of image generation models and multimodal models.

To learn more about giskard-vision, visit our quickstart guide and our GitHub repo. Don’t hesitate to give us feedback!

‍Reach out to us today to learn more about how we can help you to ensure your models are safe and reliable.

Giskard Vision: Enhance Computer Vision models for image classification, object an landmark detection