G
Tutorials
March 24, 2022
2 min read

How to test ML models? #1 👉 Introduction

What you need to know before getting started with ML Testing in 3 points

Zoom in on the problem
Jean-Marie John-Mathews, Ph.D.
Zoom in on the problem
Zoom in on the problem

While regulators are asking for Quality management systems for AI (article 17 from the European AI Act), the capacity to create tests is becoming crucial in the AI industry. But ML testing is highly complex and it’s still an active research area. Here are three reasons:

AI follows a data-driven programming paradigm

According to the paper from Paleyes (2021), unlike in traditional software products where changes only happen in the code, AI systems change along 3 axes: the code, the model, and the data. The model’s behavior evolves in response to the frequent provision of new data.

AI is not easily breakable in small unit components

Some AI properties (e.g., accuracy) only emerge as a combination of different components such as the training data, the learning program, and the learning library. It is hard to break the AI system into smaller components that can be tested in isolation.

AI errors are systemic and self-amplifying

AI is characterized by many feedback loops and interactions between components. The output of one model can be ingested into the training base of another. As a result, AI errors can be difficult to identify, measure, and correct.

In this new series, each week we are going to present the most famous AI testing methods, showing illustrative examples and practical methods to implement them. We’ll cover concepts such as:

  • Behavioral testing: metamorphic testing, heuristics testing
  • Drift testing: Kulback divergence, Kolmogorov-Smirnov, Earth mover distance tests. But also Population stability index (PSI), Trust score, and general drift tests.
  • Performance testing: model error testing, calibration score, simple model comparison
  • Efficiency testing: carbon footprint, Inference time and energy consumption testings

Bibliography

  • Zhang, J. M., Harman, M., Ma, L., & Liu, Y. (2020). Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering.
  • Paleyes, A., Urma, R. G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: a survey of case studies. arXiv preprint arXiv:2011.09926.
Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance

How to test ML models? #1 👉 Introduction

What you need to know before getting started with ML Testing in 3 points

While regulators are asking for Quality management systems for AI (article 17 from the European AI Act), the capacity to create tests is becoming crucial in the AI industry. But ML testing is highly complex and it’s still an active research area. Here are three reasons:

AI follows a data-driven programming paradigm

According to the paper from Paleyes (2021), unlike in traditional software products where changes only happen in the code, AI systems change along 3 axes: the code, the model, and the data. The model’s behavior evolves in response to the frequent provision of new data.

AI is not easily breakable in small unit components

Some AI properties (e.g., accuracy) only emerge as a combination of different components such as the training data, the learning program, and the learning library. It is hard to break the AI system into smaller components that can be tested in isolation.

AI errors are systemic and self-amplifying

AI is characterized by many feedback loops and interactions between components. The output of one model can be ingested into the training base of another. As a result, AI errors can be difficult to identify, measure, and correct.

In this new series, each week we are going to present the most famous AI testing methods, showing illustrative examples and practical methods to implement them. We’ll cover concepts such as:

  • Behavioral testing: metamorphic testing, heuristics testing
  • Drift testing: Kulback divergence, Kolmogorov-Smirnov, Earth mover distance tests. But also Population stability index (PSI), Trust score, and general drift tests.
  • Performance testing: model error testing, calibration score, simple model comparison
  • Efficiency testing: carbon footprint, Inference time and energy consumption testings

Bibliography

  • Zhang, J. M., Harman, M., Ma, L., & Liu, Y. (2020). Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering.
  • Paleyes, A., Urma, R. G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: a survey of case studies. arXiv preprint arXiv:2011.09926.

Get Free Content

Download our guide and learn What the EU AI Act means for Generative AI Systems Providers.