G
Blog
September 16, 2021
2 min read

How did the idea of Giskard emerge? #1 🤓 The ML Test Score

The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests

Running tests
Alex Combessie
Running tests
Running tests

There is not a single answer to this question. It comes from a lot of outside inspirations that stacked up over the years.

One of our inspirations is a 2017 research paper from 5 engineers at Google.

“What’s your ML Test Score? A rubric for ML production readiness and technical debt reduction”

The ML Test Score

It is a very straightforward and easy-to-read paper, summarizing a lot of good ideas in 4 pages. It starts by acknowledging something that all AI practitioners know:

Using machine learning in real-world production systems is complicated. 😓

The paper breaks down potential solutions to this challenge into 2 buckets: testing and monitoring. But how much testing and monitoring is enough?

It then introduces the concept of an ML Test Score, with points awarded for each verified test among 4 categories:

1. Features and Data
2. Model Development
3. Infrastructure
4. Monitoring Tests

Fast forward to 2021, this paper is surprisingly modern, even forward-thinking. My one regret: this approach is quite difficult to apply for companies without highly-trained software engineers & data scientists like Google. 👩‍💻🧑‍💻👩‍💻🧑‍💻👩‍💻🧑‍💻👩‍💻🧑‍💻

With Giskard, we want to make Test-Driven Data Science (TDDS) accessible to everyone. 🤓

Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance

How did the idea of Giskard emerge? #1 🤓 The ML Test Score

The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests

There is not a single answer to this question. It comes from a lot of outside inspirations that stacked up over the years.

One of our inspirations is a 2017 research paper from 5 engineers at Google.

“What’s your ML Test Score? A rubric for ML production readiness and technical debt reduction”

The ML Test Score

It is a very straightforward and easy-to-read paper, summarizing a lot of good ideas in 4 pages. It starts by acknowledging something that all AI practitioners know:

Using machine learning in real-world production systems is complicated. 😓

The paper breaks down potential solutions to this challenge into 2 buckets: testing and monitoring. But how much testing and monitoring is enough?

It then introduces the concept of an ML Test Score, with points awarded for each verified test among 4 categories:

1. Features and Data
2. Model Development
3. Infrastructure
4. Monitoring Tests

Fast forward to 2021, this paper is surprisingly modern, even forward-thinking. My one regret: this approach is quite difficult to apply for companies without highly-trained software engineers & data scientists like Google. 👩‍💻🧑‍💻👩‍💻🧑‍💻👩‍💻🧑‍💻👩‍💻🧑‍💻

With Giskard, we want to make Test-Driven Data Science (TDDS) accessible to everyone. 🤓

Get Free Content

Download our guide and learn What the EU AI Act means for Generative AI Systems Providers.