G
Blog
October 7, 2021
2 min read

How did the idea of Giskard emerge? #8 👁‍🗨 Monitoring

Monitoring is just a tool: necessary but not sufficient. You need people committed to AI maintenance, processes & tools in case things break down.

Quality Monitoring Dashboard
Alex Combessie
Quality Monitoring Dashboard
Quality Monitoring Dashboard

It is also about the need for monitoring. 👁

Four years ago, I worked on my first end-to-end AI pipeline. It involved a year of development to:
- speak to business experts to craft good features
- select & tune the best algorithm
- meet with IT engineers to understand how to integrate with the real-time systems
- create an online feedback loop with automated model retraining

At the end of the project, I added monitoring to this production system. It was no easy task! 😤

I built everything from scratch: input quality checks, model performance thresholds, and drift scores. All metrics were available in a dashboard and sent automatically to stakeholders by email.

Lastly, I onboarded other data scientists to maintain the system and moved on to other projects.

Four years later, I learned that the system I had built had somewhat failed. Some input changes in the IT system went undetected. Online performance metrics didn’t match offline metrics. Few people used the monitoring dashboard. 😢

Why?

1️. Monitoring is about finding the right Key Performance Indicators (KPIs)

We data scientists rely too much on statistical KPIs such as concept drift. They are hard to interpret, even by data scientists and especially by business users. If I were to do the project again, I would add more simple input checks and business-oriented KPIs.

2️. Alerting comes first, monitoring second

After a few months since deployment, the infamous problem of “dashboard fatigue” arises. People pay less attention to monitoring. To overcome this, you need to set up alerting mechanisms. From a user perspective, alerts are an entry point to monitoring. The hard part is to make sure you don’t trigger too many alerts.

3️. Monitoring is just one piece of the maintenance puzzle

Monitoring is just a tool: necessary but not sufficient. It is useless without people committed to AI system maintenance as well as processes & tools in case things break down. Mature organizations set up rolling on-call schedules for their engineers and invest in debugging tools to solve issues faster.

With Giskard, we want to make it easy to monitor AI models and get actionable alerts.

Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance

How did the idea of Giskard emerge? #8 👁‍🗨 Monitoring

Monitoring is just a tool: necessary but not sufficient. You need people committed to AI maintenance, processes & tools in case things break down.

It is also about the need for monitoring. 👁

Four years ago, I worked on my first end-to-end AI pipeline. It involved a year of development to:
- speak to business experts to craft good features
- select & tune the best algorithm
- meet with IT engineers to understand how to integrate with the real-time systems
- create an online feedback loop with automated model retraining

At the end of the project, I added monitoring to this production system. It was no easy task! 😤

I built everything from scratch: input quality checks, model performance thresholds, and drift scores. All metrics were available in a dashboard and sent automatically to stakeholders by email.

Lastly, I onboarded other data scientists to maintain the system and moved on to other projects.

Four years later, I learned that the system I had built had somewhat failed. Some input changes in the IT system went undetected. Online performance metrics didn’t match offline metrics. Few people used the monitoring dashboard. 😢

Why?

1️. Monitoring is about finding the right Key Performance Indicators (KPIs)

We data scientists rely too much on statistical KPIs such as concept drift. They are hard to interpret, even by data scientists and especially by business users. If I were to do the project again, I would add more simple input checks and business-oriented KPIs.

2️. Alerting comes first, monitoring second

After a few months since deployment, the infamous problem of “dashboard fatigue” arises. People pay less attention to monitoring. To overcome this, you need to set up alerting mechanisms. From a user perspective, alerts are an entry point to monitoring. The hard part is to make sure you don’t trigger too many alerts.

3️. Monitoring is just one piece of the maintenance puzzle

Monitoring is just a tool: necessary but not sufficient. It is useless without people committed to AI system maintenance as well as processes & tools in case things break down. Mature organizations set up rolling on-call schedules for their engineers and invest in debugging tools to solve issues faster.

With Giskard, we want to make it easy to monitor AI models and get actionable alerts.

Get Free Content

Download our guide and learn What the EU AI Act means for Generative AI Systems Providers.