Tutorials

July 13, 2023

10 min read

Testing Machine Learning Classification models for fraud detection

This article explains how Giskard open-source ML framework can be used for testing ML models and applied to fraud detection. It explores the components of Giskard: the Python library, its user-friendly interface, its installation process, and practical implementation for banknote authentication. The article provides step-by-step guide, code snippets, and leverages the banknote authentication dataset to develop an accurate ML model.

Testing Classification Models for Fraud Detection with Giskard

Happiness Omale

Testing Classification Models for Fraud Detection with Giskard

🧐 Introduction: Testing and evaluating Machine Learning models with Giskard

In today's interconnected and digitized world, the authenticity and trustworthiness of banknotes are critical in maintaining financial systems stability. As technology continues to evolve, so do the techniques employed by fraudsters, making it increasingly challenging to detect fraudulent currency. Innovative solutions like Giskard have emerged to address this pressing concern, offering a cutting-edge approach to banknote authentication.

Giskard is an open-source Machine Learning library that allows you to quickly test your model to ensure there are no errors. It also serves as a bridge between data preprocessing and classification algorithms, enabling seamless integration and boosting the overall performance of ML models. It also enables you to uncover hidden patterns and enhance the predictive power of your models. Giskard's feature engineering prowess is particularly beneficial when working with high-dimensional datasets, where extracting relevant features becomes crucial for achieving accurate predictions.

In this article, we will examine the components of Giskard, including its user-friendly interface, also explore the installation and practical implementation of Giskard for banknote authentication, exploring its features and functionalities, we will also discuss the step-by-step procedure and share code snippets to illustrate how giskard can be used.

Whether you are a data scientist looking to level up your ML models, a beginner eager to explore the potential of Giskard or an individual concerned with safeguarding your financial interests, understanding how to utilize Giskard will provide you with the necessary information to get started.

💶 Binary classification model use case: Banknote Authentication

We will work with the banknote authentication dataset for this article.

The banknote authentication data set is a collection of real and fake banknotes samples. The dataset aims to provide a reliable method for authenticating banknotes based on various features. The task is to develop a Machine Learning model that can accurately classify banknotes as genuine or fake based on certain features. Banknote authentication is a classification problem. It aims to predict the discrete labels 0 or 1, where 0 indicates a fake banknote and 1 indicates the banknote is real. The dataset can be found here.

Prerequisites

You will need to have Docker installed.

Install Giskard

To get started, run the following command on your terminal to install the giskard server on your computer. This sets up your Python backend.

Make sure you have a strong internet connection. When that is done, ensure the Docker container is running like this:

Output:

NB: This image is taken from the Docker desktop

Once docker-compose starts all the modules, you will be able to open giskard at http://localhost:19000/

Log in to giskard
You will upload the licence sent to your email to be able to interact with the user interface.

📚 Installing Giskard’s Machine Learning Python library

Connect the external worker

Next, start the ML Worker, the component in giskard that connects your python environment to the Giskard server you just installed. It executes the model in your working Python environment (notebook, Python IDE, etc.) To start ML Worker, execute the following command on your terminal:

You will be asked to enter an API key which can be found at the setting tab of your Giskard:

Copy the API key and paste it on your terminal requiring the key.

You should get an output like this:

After that, go to the settings of your Giskard in your browser and ensure that it is connected to the external worker.

🏃 Training a classification model and uploading it to Giskard

Write the following code inside a Jupyter notebook

First import libraries

Read the dataset

Declare the type of each column in the dataset (example: category, numeric, text)

Train a classifier

Fit and score your model

Why Random Forest Classifier?

The Random forest classifier is a popular machine learning algorithm that is well-suited for various classification tasks, including the authentication of banknotes. Here are some reasons why the Random Forest classifier may be a good choice for the banknote authentication dataset:

Ensemble Method: Random forest is an ensemble learning method that combines multiple decision trees to make predictions. Each tree is trained using a randomly selected sample of the data, and the final prediction is obtained by aggregating the predictions of individual trees. This ensemble approach improves the overall accuracy and generalization of the model.
Robust to Overfitting: Random forest helps mitigate overfitting, which occurs when a model learns too much from the training data and fails to generalize well to unseen data. By using random subsets of the data and random subsets of features for each tree, the model reduces the risk of overfitting and provides more reliable performance on unseen data.
Robust to Outliers: Random forest is less affected by outliers compared to some other classifiers, the impact of individual outliers is typically reduced, leading to more robust predictions.
Easy to Use and Interpret: Random forest is relatively easy to implement and tune, making it a popular choice among practitioners. It also measures feature importance, allowing for a better understanding and interpretation of the model's behaviour.
Feature Importance: Random forest measures feature importance, indicating which features have the most significant impact on the classification task. This information can be valuable for understanding banknotes’' underlying patterns and characteristics that contribute to their authenticity.

Evaluating your Machine Learning model with Giskard

🔍 Scan your ML model to detect issues

Output:

Let’s show the scan results:

Output:

Using Giskard Machine Learning solution for fraud detection

As we see above, the model detected four vulnerabilities from the scan which includes: Performance bias, Overconfidence, Underconfidence and Spurious Correlation. Let’s discuss them and how they can affect banknote authentication.

Performance bias

‍The performance bias issue in machine learning refers to a situation where a model exhibits low performance on specific data slices or subsets, despite satisfactory performance on the overall dataset. Some factors that can cause performance bias include:

Data Imbalance: When the dataset contains imbalanced classes or unequal representation of different groups, the model may prioritize the majority class or dominant groups in its learning process.
Biased Training Data: If the training data used to train the model contains inherent biases or reflects societal prejudices, the model may learn to reinforce these biases, resulting in performance bias.
Model Complexity and Capacity: Models with high complexity or excessive capacity can overfit the majority class or dominant groups in the training data.

Performance bias vulnerabilities can affect banknote authentication in several ways. For example, if a Giskard model is biased towards certain types of bank notes, it may be more likely to authenticate those notes as real, even if they contain signs of forgery. The model may have been trained on a dataset disproportionately biased towards those notes.

Additionally, performance bias vulnerabilities can make it more difficult to detect new counterfeits that are designed to exploit the model's biases. This is because the model may be so accustomed to seeing certain types of banknotes that it will not even consider the possibility that a new counterfeit is real. For example, a model that is trained on a dataset of mostly US banknotes may be more likely to authenticate a fake US banknote than a fake note from another country.

It is essential to carefully select the training data for giskard models to mitigate the risk of performance bias vulnerabilities. This data should represent the range of real and fake banknotes the model is expected to encounter in the real world. Doing this makes it possible to ensure that the model is not biased toward any particular type of note.

Overconfidence

‍The overconfidence issue in machine learning refers to the phenomenon where a machine learning model produces predictions that are incorrect but are assigned high probabilities or confidence scores. This means the model is overly confident in its predictions, even when inaccurate. Some factors that can cause overconfidence include:

Data Bias: If the training data used to train the model contains inherent biases or lacks diversity, the model may not be exposed to a wide range of scenarios.
Overfitting: Overfitting occurs when a model becomes too complex and adapts too closely to the training data. As a result, the model may not generalize well to unseen data and may exhibit overconfidence in its predictions, even though they are inaccurate.
Imbalanced Classes: In classification tasks, imbalanced class distributions can lead to overconfident predictions. Suppose the model is trained on a dataset where one class is significantly more prevalent than others. In that case, it may assign high probabilities to predictions of the majority class, even when they are incorrect.

Overconfidence vulnerabilities can affect banknote authentication in several ways. For example, if a giskard model is overconfident in its predictions, it may be more likely to accept fake banknotes as real. The model may be less likely to flag a fake note as suspicious, even if it contains some signs of forgery.

Additionally, overconfidence vulnerabilities can make it more difficult to detect new fake note that are designed to fool the model. The model may be so confident in its predictions that it will not even consider the possibility of a real new fake note.

Underconfidence

‍The underconfidence issue for classification in machine learning refers to the phenomenon where a machine learning model produces predictions with low confidence, even when the actual label is highly likely. In underconfident predictions, the predicted label is very close to the probability of the next highest probability label. Some factors that can cause underconfidence include:

Insufficient Model Training: If the model is not adequately trained on diverse and representative data, it may lack the necessary information to make confident predictions.
Imbalanced Classes: When there is a scarcity of examples or a significant class imbalance, the model may struggle to estimate probabilities, leading to underconfident predictions accurately.
Uncertain Data Characteristics: In scenarios where the input data contains inherent noise, ambiguity, or overlapping feature distributions, the model may find it challenging to make confident predictions. Uncertainty in the data can propagate into the model's output, causing underconfidence.

Underconfidence vulnerabilities can also affect banknote authentication. For example, if a model is underconfident in its predictions, it may be more likely to reject real banknotes as fake. This is because the model may be more likely to flag a real note as fake, even if it contains no signs of forgery.

Also, underconfidence vulnerabilities can make detecting new fake notes designed to fool the model more difficult. This is because the model may be so uncertain in its predictions that it will not even consider the possibility that a new fake note is real.

Spurious Correlation

‍Spurious correlation refers to a situation in machine learning where a feature and the model prediction appear statistically correlated. However, their relationship is coincidental or caused by external factors rather than a genuine causal or meaningful connection. Some factors that can cause spurious correlation include:

Confounding Variables: Spurious correlations may arise when confounding variables influence both the predicted variable and the feature being considered. These variables can create an illusion of correlation between the feature and the prediction, even though they are not causally related to each other.
Data Noise: Spurious correlations can occur due to data noise or anomalies unrelated to the underlying problem. This noise may result from errors in data collection, measurement biases, data preprocessing issues, or other data-specific factors.
Random Chance: In some cases, spurious correlations can occur purely by chance. When working with large datasets or many features, the likelihood of finding coincidental correlations increases. These correlations are not meaningful but are simply random occurrences that can mislead model predictions.

Spurious correlation vulnerabilities can also affect banknote authentication. For example, if a model learns a spurious correlation between two features of bank notes, it may be more likely to authenticate a fake note with those features. This is because the model may be unable to distinguish between a real note with those features and a fake note that has been deliberately designed to have those features.

Also, spurious correlation vulnerabilities can make it more difficult to detect new fake notes that are designed to exploit the model's spurious correlations. This is because the model may be so focused on spurious correlations that it will not even consider the possibility that a new fake note is real. For example, a model trained on a dataset of bank notes scanned in different lighting conditions may learn a spurious correlation between the brightness of the note and its authenticity. This is because the brightness of the note may be affected by the lighting conditions in which it was scanned rather than its authenticity.

You can read more on the key vulnerabilities here.

📊 Generate a test suite for your classification model

Output:

⬆ Upload your test suite to the Giskard server

When we run our giskard server, you’ll have the local host running, this is where the uploaded data will be found.
Next, you need to generate your API token in the settings tab of the giskard application.

You can choose the arguments you want for the following;

your_project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION"). In our case, it is bank_note_authentication = client.create_project("bank_note_authentication", "Bank Note Authentication", "Project to classify if a banknote is real or fake"). Note: "project_key" should be unique and in lowercase.

Output:

Your dataset and model will be uploaded to giskard and available at http://localhost:19000/main/projects/439/test-suite/440/overview

This URL http://localhost:19000/main/projects/439/test-suite/440/overview will take you to the Giskard interface.

You are all set to try giskard in action! You can now generate some test suite.

Check for scan that failed:

Edit Parameters to check if the test suite will pass.

Applying data slices

Slices pre-generated by the scan feature:

‍When the slice to apply is set to none:

As we see from the output, we have a prediction 1 because the model may have learned that all banknotes are real, this is because the model may have been trained on a dataset that only contains real banknotes.

To avoid this, it is important to use a variety of slices when testing giskard models. This will help to ensure that the model is not simply predicting 1 because it has been trained on a dataset of only real images.

When the slice to apply is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01:

As we see from the output, we get the prediction 0 when the slice to apply on giskard is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01 because there are no real banknotes in the training data that fall within this slice. This means that the model has not learned to associate this slice with real banknotes and therefore predicts 0, which is the label for fake banknotes.

When the slice to apply is set to ‘skewness’<1.678 AND ‘skewness’>=0.807:

We might get a prediction of 0 when we apply the slice ‘skewness’<1.678 AND ‘skewness’>=0.807 to a giskard model.

One possibility is that no data points in the model fall within the specified range of skewness values. This could be because the model was not trained on any data points with those skewness values or because the data points with those skewness values were filtered out during the training process.

When the slice to apply is set to ‘variance’ <-1.437e-01 AND ‘variance’>= -6.512e-01:

We are getting a prediction of 0 when the slice to apply on giskard is set to ‘variance’ < -1.437e-01 AND ‘variance’>= -6.512e-01 because this slice does not contain any data points. This is because the range of values for variance in this slice is so small that no data points fall within it.

To fix this, you can either widen the range of values for variance in the slice or remove the slice altogether. If you widen the range of values, you will include more data points in the slice, which will give you a more accurate prediction. If you remove the slice altogether, you cannot identify any vulnerabilities in the model's input space for this particular type of attack.

✅ Conclusion

In this article, we have understood how to use giskard for banknote authentication. Also, we made sure the model was successfully deployed and worked well in production. I hope you find this helpful. Happy testing!

Integrate | Scan | Test | Automate

Giskard: Testing platform to secure LLM Agents

Get alerted of new vulnerabilities

Protect agaisnt AI risks

Identify security vulnerabilities & hallucination

Enable cross-team collaboration

GET STARTED

Testing Machine Learning Classification models for fraud detection

🧐 Introduction: Testing and evaluating Machine Learning models with Giskard

💶 Binary classification model use case: Banknote Authentication

We will work with the banknote authentication dataset for this article.

Prerequisites

You will need to have Docker installed.

Install Giskard

To get started, run the following command on your terminal to install the giskard server on your computer. This sets up your Python backend.

Make sure you have a strong internet connection. When that is done, ensure the Docker container is running like this:

Output:

NB: This image is taken from the Docker desktop

Once docker-compose starts all the modules, you will be able to open giskard at http://localhost:19000/

Log in to giskard
You will upload the licence sent to your email to be able to interact with the user interface.

📚 Installing Giskard’s Machine Learning Python library

Connect the external worker

You will be asked to enter an API key which can be found at the setting tab of your Giskard:

Copy the API key and paste it on your terminal requiring the key.

You should get an output like this:

After that, go to the settings of your Giskard in your browser and ensure that it is connected to the external worker.

🏃 Training a classification model and uploading it to Giskard

Write the following code inside a Jupyter notebook

First import libraries

Read the dataset

Declare the type of each column in the dataset (example: category, numeric, text)

Train a classifier

Fit and score your model

Why Random Forest Classifier?

Ensemble Method: Random forest is an ensemble learning method that combines multiple decision trees to make predictions. Each tree is trained using a randomly selected sample of the data, and the final prediction is obtained by aggregating the predictions of individual trees. This ensemble approach improves the overall accuracy and generalization of the model.
Robust to Overfitting: Random forest helps mitigate overfitting, which occurs when a model learns too much from the training data and fails to generalize well to unseen data. By using random subsets of the data and random subsets of features for each tree, the model reduces the risk of overfitting and provides more reliable performance on unseen data.
Robust to Outliers: Random forest is less affected by outliers compared to some other classifiers, the impact of individual outliers is typically reduced, leading to more robust predictions.
Easy to Use and Interpret: Random forest is relatively easy to implement and tune, making it a popular choice among practitioners. It also measures feature importance, allowing for a better understanding and interpretation of the model's behaviour.
Feature Importance: Random forest measures feature importance, indicating which features have the most significant impact on the classification task. This information can be valuable for understanding banknotes’' underlying patterns and characteristics that contribute to their authenticity.

Evaluating your Machine Learning model with Giskard

🔍 Scan your ML model to detect issues

Output:

Let’s show the scan results:

Output:

Using Giskard Machine Learning solution for fraud detection

Performance bias

Data Imbalance: When the dataset contains imbalanced classes or unequal representation of different groups, the model may prioritize the majority class or dominant groups in its learning process.
Biased Training Data: If the training data used to train the model contains inherent biases or reflects societal prejudices, the model may learn to reinforce these biases, resulting in performance bias.
Model Complexity and Capacity: Models with high complexity or excessive capacity can overfit the majority class or dominant groups in the training data.

Overconfidence

Data Bias: If the training data used to train the model contains inherent biases or lacks diversity, the model may not be exposed to a wide range of scenarios.
Overfitting: Overfitting occurs when a model becomes too complex and adapts too closely to the training data. As a result, the model may not generalize well to unseen data and may exhibit overconfidence in its predictions, even though they are inaccurate.
Imbalanced Classes: In classification tasks, imbalanced class distributions can lead to overconfident predictions. Suppose the model is trained on a dataset where one class is significantly more prevalent than others. In that case, it may assign high probabilities to predictions of the majority class, even when they are incorrect.

Underconfidence

Insufficient Model Training: If the model is not adequately trained on diverse and representative data, it may lack the necessary information to make confident predictions.
Imbalanced Classes: When there is a scarcity of examples or a significant class imbalance, the model may struggle to estimate probabilities, leading to underconfident predictions accurately.
Uncertain Data Characteristics: In scenarios where the input data contains inherent noise, ambiguity, or overlapping feature distributions, the model may find it challenging to make confident predictions. Uncertainty in the data can propagate into the model's output, causing underconfidence.

Spurious Correlation

Confounding Variables: Spurious correlations may arise when confounding variables influence both the predicted variable and the feature being considered. These variables can create an illusion of correlation between the feature and the prediction, even though they are not causally related to each other.
Data Noise: Spurious correlations can occur due to data noise or anomalies unrelated to the underlying problem. This noise may result from errors in data collection, measurement biases, data preprocessing issues, or other data-specific factors.
Random Chance: In some cases, spurious correlations can occur purely by chance. When working with large datasets or many features, the likelihood of finding coincidental correlations increases. These correlations are not meaningful but are simply random occurrences that can mislead model predictions.

You can read more on the key vulnerabilities here.

📊 Generate a test suite for your classification model

Output:

⬆ Upload your test suite to the Giskard server

When we run our giskard server, you’ll have the local host running, this is where the uploaded data will be found.
Next, you need to generate your API token in the settings tab of the giskard application.

You can choose the arguments you want for the following;

Output:

Your dataset and model will be uploaded to giskard and available at http://localhost:19000/main/projects/439/test-suite/440/overview

This URL http://localhost:19000/main/projects/439/test-suite/440/overview will take you to the Giskard interface.

You are all set to try giskard in action! You can now generate some test suite.

Check for scan that failed:

Edit Parameters to check if the test suite will pass.

Applying data slices

Slices pre-generated by the scan feature:

‍When the slice to apply is set to none:

When the slice to apply is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01:

When the slice to apply is set to ‘skewness’<1.678 AND ‘skewness’>=0.807:

We might get a prediction of 0 when we apply the slice ‘skewness’<1.678 AND ‘skewness’>=0.807 to a giskard model.

When the slice to apply is set to ‘variance’ <-1.437e-01 AND ‘variance’>= -6.512e-01:

✅ Conclusion

Get Free Content

Download our guide and learn What the EU AI Act means for Generative AI Systems Providers.

You will also like

Happy green robot generated by open-source generative AI model Stable Diffusion

Tutorials

How to deploy a robust HuggingFace model for sentiment analysis into production?

This tutorial teaches you how to build, test and deploy a Huggingface AI model for sentiment analysis while ensuring its robustness in production.

Princy Pappachan

View post

Picture illustrating gender bias generated by DALL-E2

Tutorials

How to test the fairness of ML models? The 80% rule to measure the disparate impact

This article provides a step-by-step guide to detecting ethical bias in AI models, using a customer churn model as an example, using the LightGBM ML library. We show how to calculate the disparate impact metric with respect to gender and age, and demonstrate how to implement this metric as a fairness test within Giskard's open-source ML testing framework.

Rabah Abdul Khalek

View post

Robot reading a newspaper generated by open-source generative AI model ControlNet and Stable Diffusion

Tutorials

How to evaluate and load a PyTorch model with Giskard?

This tutorial teaches you how to upload a PyTorch model (built from scratch or pre-trained) to Giskard, and identify potential errors and biases.

Favour Kelvin

View post