ML | Credit Card Fraud Detection - GeeksforGeeks (2024)

Improve

Improve

Improve

Like Article

Like

Save Article

Save

Report issue

Report

The challenge is to recognize fraudulent credit card transactions so that the customers of credit card companies are not charged for items that they did not purchase.Main challenges involved in credit card fraud detection are:

  1. Enormous Data is processed every day and the model build must be fast enough to respond to the scam in time.
  2. Imbalanced Data i.e most of the transactions (99.8%) are not fraudulent which makes it really hard for detecting the fraudulent ones
  3. Data availability as the data is mostly private.
  4. Misclassified Data can be another major issue, as not every fraudulent transaction is caught and reported.
  5. Adaptive techniques used against the model by the scammers.

How to tackle these challenges?

  1. The model used must be simple and fast enough to detect the anomaly and classify it as a fraudulent transaction as quickly as possible.
  2. Imbalance can be dealt with by properly using some methods which we will talk about in the next paragraph
  3. For protecting the privacy of the user the dimensionality of the data can be reduced.
  4. A more trustworthy source must be taken which double-check the data, at least for training the model.
  5. We can make the model simple and interpretable so that when the scammer adapts to it with just some tweaks we can have a new model up and running to deploy.

Before going to the code it is requested to work on a jupyter notebook. If not installed on your machine you can use Google colab.You can download the dataset from this linkIf the link is not working please go to this link and login to kaggle to download the dataset.Code : Importing all the necessary Libraries

# import the necessary packages

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from matplotlib import gridspec

 
 

Code : Loading the Data

# Load the dataset from the csv file using pandas

# best way is to mount the drive on colab and

# copy the path for the csv file

data = pd.read_csv("credit.csv")

 
 

Code : Understanding the Data

# Grab a peek at the data

data.head()

 
 

ML | Credit Card Fraud Detection - GeeksforGeeks (1)Code : Describing the Data

# Print the shape of the data

# data = data.sample(frac = 0.1, random_state = 48)

print(data.shape)

print(data.describe())

 
 

Output :

(284807, 31) Time V1 ... Amount Classcount 284807.000000 2.848070e+05 ... 284807.000000 284807.000000mean 94813.859575 3.919560e-15 ... 88.349619 0.001727std 47488.145955 1.958696e+00 ... 250.120109 0.041527min 0.000000 -5.640751e+01 ... 0.000000 0.00000025% 54201.500000 -9.203734e-01 ... 5.600000 0.00000050% 84692.000000 1.810880e-02 ... 22.000000 0.00000075% 139320.500000 1.315642e+00 ... 77.165000 0.000000max 172792.000000 2.454930e+00 ... 25691.160000 1.000000[8 rows x 31 columns]

Code : Imbalance in the dataTime to explain the data we are dealing with.

# Determine number of fraud cases in dataset

fraud = data[data['Class'] == 1]

valid = data[data['Class'] == 0]

outlierFraction = len(fraud)/float(len(valid))

print(outlierFraction)

print('Fraud Cases: {}'.format(len(data[data['Class'] == 1])))

print('Valid Transactions: {}'.format(len(data[data['Class'] == 0])))

 
 

ML | Credit Card Fraud Detection - GeeksforGeeks (2)Only 0.17% fraudulent transaction out all the transactions. The data is highly Unbalanced. Lets first apply our models without balancing it and if we don’t get a good accuracy then we can find a way to balance this dataset. But first, let’s implement the model without it and will balance the data only if needed.Code : Print the amount details for Fraudulent Transaction

print(“Amount details of the fraudulent transaction”)

fraud.Amount.describe()

 
 

Output :

Amount details of the fraudulent transactioncount 492.000000mean 122.211321std 256.683288min 0.00000025% 1.00000050% 9.25000075% 105.890000max 2125.870000Name: Amount, dtype: float64

Code : Print the amount details for Normal Transaction

print(“details of valid transaction”)

valid.Amount.describe()

 
 

Output :

Amount details of valid transactioncount 284315.000000mean 88.291022std 250.105092min 0.00000025% 5.65000050% 22.00000075% 77.050000max 25691.160000Name: Amount, dtype: float64

As we can clearly notice from this, the average Money transaction for the fraudulent ones is more. This makes this problem crucial to deal with.Code : Plotting the Correlation MatrixThe correlation matrix graphically gives us an idea of how features correlate with each other and can help us predict what are the features that are most relevant for the prediction.

# Correlation matrix

corrmat = data.corr()

fig = plt.figure(figsize = (12, 9))

sns.heatmap(corrmat, vmax = .8, square = True)

plt.show()

 
 

ML | Credit Card Fraud Detection - GeeksforGeeks (3)In the HeatMap we can clearly see that most of the features do not correlate to other features but there are some features that either has a positive or a negative correlation with each other. For example, V2 and V5 are highly negatively correlated with the feature called Amount. We also see some correlation with V20 and Amount. This gives us a deeper understanding of the Data available to us.Code : Separating the X and the Y valuesDividing the data into inputs parameters and outputs value format

# dividing the X and the Y from the dataset

X = data.drop(['Class'], axis = 1)

Y = data["Class"]

print(X.shape)

print(Y.shape)

# getting just the values for the sake of processing

# (its a numpy array with no columns)

xData = X.values

yData = Y.values

 
 

Output :

 (284807, 30)(284807, )

Training and Testing Data BifurcationWe will be dividing the dataset into two main groups. One for training the model and the other for Testing our trained model’s performance.

# Using Scikit-learn to split data into training and testing sets

from sklearn.model_selection import train_test_split

# Split the data into training and testing sets

xTrain, xTest, yTrain, yTest = train_test_split(

xData, yData, test_size = 0.2, random_state = 42)

 
 

Code : Building a Random Forest Model using scikit learn

# Building the Random Forest Classifier (RANDOM FOREST)

from sklearn.ensemble import RandomForestClassifier

# random forest model creation

rfc = RandomForestClassifier()

rfc.fit(xTrain, yTrain)

# predictions

yPred = rfc.predict(xTest)

 
 

Code : Building all kinds of evaluating parameters

# Evaluating the classifier

# printing every score of the classifier

# scoring in anything

from sklearn.metrics import classification_report, accuracy_score

from sklearn.metrics import precision_score, recall_score

from sklearn.metrics import f1_score, matthews_corrcoef

from sklearn.metrics import confusion_matrix

n_outliers = len(fraud)

n_errors = (yPred != yTest).sum()

print("The model used is Random Forest classifier")

acc = accuracy_score(yTest, yPred)

print("The accuracy is {}".format(acc))

prec = precision_score(yTest, yPred)

print("The precision is {}".format(prec))

rec = recall_score(yTest, yPred)

print("The recall is {}".format(rec))

f1 = f1_score(yTest, yPred)

print("The F1-Score is {}".format(f1))

MCC = matthews_corrcoef(yTest, yPred)

print("The Matthews correlation coefficient is{}".format(MCC))

 
 

Output :

The model used is Random Forest classifierThe accuracy is 0.9995611109160493The precision is 0.9866666666666667The recall is 0.7551020408163265The F1-Score is 0.8554913294797689The Matthews correlation coefficient is0.8629589216367891

Code : Visualizing the Confusion Matrix

# printing the confusion matrix

LABELS = ['Normal', 'Fraud']

conf_matrix = confusion_matrix(yTest, yPred)

plt.figure(figsize =(12, 12))

sns.heatmap(conf_matrix, xticklabels = LABELS,

yticklabels = LABELS, annot = True, fmt ="d");

plt.title("Confusion matrix")

plt.ylabel('True class')

plt.xlabel('Predicted class')

plt.show()

 
 

Output :

ML | Credit Card Fraud Detection - GeeksforGeeks (4)

Comparison with other algorithms without dealing with the imbalancing of the data.ML | Credit Card Fraud Detection - GeeksforGeeks (5)As you can see with our Random Forest Model we are getting a better result even for the recall which is the most tricky part.


Last Updated : 05 Aug, 2022

Like Article

Save Article

Previous

Wine Quality Prediction - Machine Learning

Next

Disease Prediction Using Machine Learning

Share your thoughts in the comments

Please Login to comment...

Similar Reads

Automating the Machine Learning Pipeline for Credit card fraud detection Online Payment Fraud Detection using Machine Learning in Python Credit Card Number Validator using ReactJS JUnit - Testcases for Credit Card Validation as a Maven Project Real-Time Edge Detection using OpenCV in Python | Canny edge detection method Python | Corner detection with Harris Corner Detection method using OpenCV Python | Corner Detection with Shi-Tomasi Corner Detection Method using OpenCV Object Detection with Detection Transformer (DETR) by Facebook Comparing anomaly detection algorithms for outlier detection on toy datasets in Scikit Learn

Complete Tutorials

Python Crash Course Python API Tutorial: Getting Started with APIs Advanced Python Tutorials Python Automation Tutorial OpenAI Python API - Complete Guide
A

amankrsharma3

Article Tags :

  • Machine Learning
  • Project
  • Python

Practice Tags :

  • Machine Learning
  • python

Additional Information

Trending in News

View More
  • Flipkart Launches UPI Service: Know how to Use it
  • 10 Best AI Platforms for Forex Trading in 2024
  • Bitcoin Price Surpasses $59,000 for the First Time Since 2021: What Investors Need to Know
  • 10 Best AI Platforms for Innovation in 2024
  • Dev Scripter 2024 - Biggest Technical Writing Event By GeeksforGeeks

`); cardAdded = true; } $(window).resize(function(){ if(window.innerWidth >= 992 ){ if(!cardAdded){ var lastCard = $('#whatsNewCardContainer .card-layout:last-child').html(); $('#whatsNewCardContainer .card-layout:last-child').hide(); $('#secondary').prepend(`

${lastCard?lastCard:''}

`); cardAdded = true; } } else{ $('#secondary .card-layout:first-child').hide(); $('#whatsNewCardContainer .card-layout:last-child').show(); cardAdded = false; } }); $('#whatsNewCardContainer .card-layout-parent').css({ 'max-height': '255px', // 'margin-bottom': '10px' }); $('#whatsNewCardContainer .side--container_wscard').css({ 'padding': '25px 10px 0' }); $('#whatsNewCardContainer .card-layout-parent > div:first-child .side--container_wscard').css({ 'padding': '2px 10px 0' }); });

ML | Credit Card Fraud Detection - GeeksforGeeks (2024)
Top Articles
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 6135

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.