Bank loan case study:Excel project - (2024)

Explore a real-world project (Bank loan case study:Excel project), focusing on risk analytics and Exploratory Data Analysis (EDA). Learn how to clean and analyze extensive loan application datasets, identify outliers, and perform univariate and bivariate analysis using pivot tables and charts. Discover how data-driven insights can be used to reduce the risk of financial losses when lending to consumers in the banking and financial services industry. Enhance your Excel skills and gain practical knowledge in risk analytics with this immersive project(Bank loan case study:Excel project

Bank loan case study:Excel project

Table of Contents

Project Description:

This case study focuses on utilizing Exploratory Data Analysis (EDA) techniques in a real-world business context, specifically in the banking industry. The objective is to demonstrate how EDA can be applied to analyze and mitigate risks associated with lending money to consumers.The project involves working with two extensive datasets: the current loan applications and the previous loan applications. These datasets contained unnecessary columns and missing data, which required initial data cleaning. Once the data was cleaned, the analysis proceeded to identify and handle outliers before conducting univariate and bivariate analysis.

The univariate and bivariate analysis involved utilizing pivot tables and charts to gain insights from the large dataset. This allowed for a comprehensive understanding of the risk factors associated with lending and facilitated the reduction of potential financial losses.

Through this case study, participants will not only apply EDA techniques but also gain a foundational understanding of risk analytics in the banking and financial services domain. The project showcases the practical utilization of data to make informed lending decisions and manage risk effectively.

Approach:

This case study has two enormous data sets: the current application and the previousapplication. Each included several unneeded columns that would be useless for riskassessments,aswellasmany blank data.Istartedbycleaning.

To evaluate this enormous set of data, I first cleaned the data, located some outliersand deleted them, and then began performing univariate and bivariate analysis usingpivottablesandcharts

Thefollowingtechnologystackwasused:

MySQLWorkbench8.0CE,MicrosoftExcel2010

Results:

I went through the risk analytics process step by step, task after task. The projectoutcomesareasfollows:

  1. Overall Method to Analysis: The bank’s problem statement is to identify the majorcauses of bank loan default. The knowledge will be used for risk assessment by thecompany. Wehaveprovidedtwoenormousdatasetshere.
    1. ‘application data.csv’ contains all of the client’s information at the time ofapplication.Theinformationpertainstowhetherornotaclientishavingfinancialissues.
    1. ‘previous application.csv’ provides data from the client’s previous loans. ItindicatesifthepriorapplicationwasAccepted,Cancelled,Refused,orUnused.

Both sets of data contained many undesired columns that will not be used for riskanalytics,aswellasmany blanks.SoIcleanedupthedata.

Task 1:Identifythe missing data and use appropriate method to deal with it. (Remove columns/or replace it with an appropriate value)

Following the data cleaning procedure, I split columns in the dataset based on twocategoriesofvariables.

  1. Categoricalvariables
  2. Numericalvariables

Categoricalvariables(non-numericalvariables)–person’soccupation,educationstatus.

Numericalvariablesincome,creditetc.,

The following are some of the categorical and numerical variables from the provideddata set.

CategoricalvariablesNumericvariables
GenderAge
NamecontracttypeDaysemployed
IncometypeAmountIncome
EducationAmountAnnuity
HousingtypeAmountCredit

I completed full EDA on the present application and then on the previous application.Then, inthis report,I summarisedtheresults ofbothapplications andprovidedbusinessinsights.

Steps to clean the data:

  1. Identify and handle missing data: Identify any missing values in your dataset and decide how to handle them. You can either delete rows or columns with missing data, fill in the missing values with averages or other suitable values, or use advanced techniques like imputation to estimate missing values.
  2. Remove duplicate entries: Check for duplicate rows or records in your dataset and remove them to ensure data integrity.

Task2(FindMissingData):Theexistingapplicationsheetincluded161columns.

  1. Ideletedcolumnswithmorethan5%blankdata.
  2. Ideletedalargenumberofuselesscolumns.

Toeliminateblankvalues,IusedtheCOUNTBLANKfunction

Task 3: Identify if there areoutliersin the dataset

Outliers can only be identified on Numeric variables. To identify outliers in Excel, follow these simple steps:

  1. Calculate the interquartile range (IQR) by subtracting the first quartile (25th percentile) from the third quartile (75th percentile) using the formula: “=QUARTILE.INC(range,3) – QUARTILE.INC(range,1)”.
  2. Determine the lower bound for outliers by subtracting 1.5 times the IQR from the first quartile using the formula: “=QUARTILE.INC(range,1) – (1.5 * IQR)”.
  3. Determine the upper bound for outliers by adding 1.5 times the IQR to the third quartile using the formula: “=QUARTILE.INC(range,3) + (1.5 * IQR)”.
  4. Use an IF formula to flag outliers. In a new column, use the formula: “=IF(OR(data<lower_bound, data>upper_bound), ‘Outlier’, ”)”. Replace “data” with the cell reference of the data point you want to check.

By following these simplified steps, you can calculate the IQR, determine the lower and upper bounds, and flag outliers in your Excel dataset.

Bank loan case study:Excel project - (1)
Bank loan case study:Excel project - (2)

Do the same thing with other factors or columns also.

For my Excel sheet -Pay 99₹ landfill the form.You will get my excel file link and support from my side.

Task.4 Identify if there is data imbalance in the data. Find the ratio of data imbalance.

Dataimbalanceoccurswhendataisdisseminatedinanunequalmanner. Data imbalance in Excel refers to an uneven distribution of data across different categories or classes. It can cause issues when analyzing or modeling the data, especially if one class is significantly underrepresented compared to others.

To address data imbalance, techniques like resampling (adjusting class frequencies), synthetic data generation, adjusting class weights, using ensemble methods, and feature engineering can be employed. These methods help ensure a more balanced representation of classes and improve the performance of models on imbalanced data.

How to plotdataimbalanceusingPivotcharts.?

Data imbalance can be effectively visualized and analyzed using Pivot charts in Excel. Pivot charts offer a convenient way to examine the distribution of categorical variables and identify any potential data imbalance. Here’s a step-by-step guide:

  1. Select the dataset in Excel that contains the categorical variable you want to assess for data imbalance.
  2. Navigate to the “Insert” tab in the Excel ribbon and click on “PivotTable.” A dialog box will appear.
  3. Specify the range of your dataset and choose the destination for the Pivot Table, such as a new worksheet.
  4. In the PivotTable Field List, drag the categorical variable representing the class or category you want to analyze into the “Rows” area.
  5. Drag the same categorical variable into the “Values” area. Excel will automatically calculate the count of occurrences for each category.
  6. Create a Pivot chart based on the PivotTable. With the PivotTable selected, go to the “Insert” tab and choose the desired chart type, such as a bar chart, column chart, or pie chart.
Bank loan case study:Excel project - (3)
Bank loan case study:Excel project - (4)

Do the same thing with other factors or columns also.

Task5(EDA): UnivariateAnalysis and Bivariate Analysis

What is UnivariateAnalysis?

Univariate analysis is a statistical approach that focuses on analyzing and interpreting data for a single variable without considering its relationship with other variables. It involves examining the distribution, measures of central tendency, variability, and other characteristics of the variable to gain insights and understand its behavior.

The main objective of the univariate analysis is to summarize and describe the data for a single variable. It helps in identifying patterns, detecting outliers, and assessing the overall distribution of the variable. By calculating summary statistics such as mean, median, mode, range, and standard deviation, analysts can understand the central tendency and variability of the data.

Visualization techniques, such as histograms, box plots, and bar charts, are commonly used in univariate analysis to provide a visual representation of the variable’s distribution and characteristics.

The univariate analysis serves as a fundamental step in data exploration, providing a comprehensive understanding of individual variables. It forms the basis for further analysis and decision-making in various fields such as research, business, and data science.

Analysis

This analysis reveals interesting trends regarding loan applications and applicant characteristics. It shows that individuals with higher incomes exhibit lower loan application rates. The credit amount for bank loans typically falls within the range of 45,000 to 1,045,000. The majority of loan applications originate from individuals aged between 35 and 50. Furthermore, those with 0 to 8 years of work experience are the most inclined to seek loans. Homeownership correlates with a higher likelihood of applying for loans. Additionally, married individuals tend to have a greater number of loan applications. Working individuals are more prone to requesting loans, while unaccompanied minors have also shown a notable interest in obtaining additional loans.

Bank loan case study:Excel project - (5)

BivariateAnalysis:

What is Bivariate analysis?

Bivariate analysis is a statistical technique that examines the relationship between two variables. It explores how changes in one variable correspond to changes in another variable. The purpose is to understand the association, patterns, and dependencies between the two variables. Bivariate analysis aids in making informed decisions and predicting outcomes based on the observed relationship between variables. It is an essential tool in data analysis and provides valuable insights for various fields of study.

Analysis

This analysis uncovers significant patterns regarding loan defaults and customer characteristics. It reveals that customers residing in low-rating areas are more prone to higher default rates. Individuals with lower incomes have a higher likelihood of defaulting on their loans. Additionally, young people have a greater tendency to default, while the likelihood of default gradually decreases with age. Furthermore, females exhibit a lower inclination towards defaults compared to males. Maternity leave and unemployment are identified as factors contributing to an increased likelihood of defaults. Customers with larger family sizes, consisting of more than five members, are more likely to default on their bank loans. Similarly, customers with fewer educational qualifications have a higher probability of loan default. Lastly, customers with limited work experience are more likely to face defaults in loan repayments.

Task.6: Find the top 10correlationfor the Client with payment difficulties and all other cases (Target variable)

Toptenreasonsforloancancellationandrefusal

  1. AmountApplication
  2. CashloanPurpose
  3. GoodsCategory
  4. ProductCombination
  5. Producttype
  6. Channeltype
  7. MonthsDecision
  8. Contracttype
  9. Clienttype
  10. Paymenttype

What is Correlation?

Correlation measures the relationship between two variables. It shows how they change together: positively, negatively, or not at all. The correlation coefficient, denoted as “r,” ranges from -1 to 1. A coefficient of 1 means a perfect positive correlation, -1 means a perfect negative correlation, and 0 means no correlation. Correlation helps us understand the strength and direction of the relationship, but it doesn’t imply causation. It’s used to analyze data and predict outcomes in various fields.

To find correlations between the top ten reasons for loan cancellation and refusal using Excel commands, you can follow these steps:

  1. Open Microsoft Excel and import your loan dataset containing the relevant variables.
  2. Select an empty cell where you want the correlation analysis results to be displayed.
  3. Enter the following Excel command in that cell: =CORREL(
  4. Now, select the range of data for the first reason, “Amount Application.” For example, if the values for the amount application are in cells A2 to A100, you would enter A2:A100 after the CORREL( command.
  5. Type a comma (,) after the range selection for the first reason.
  6. Repeat steps 4 and 5 for each of the remaining nine reasons, separating each range selection with a comma.
  7. After entering the range for the last reason, close the formula with a closing bracket ) and press Enter.
  8. Excel will calculate the correlation coefficient between each pair of variables and display the result in the selected cell.
  9. Repeat the process if you have multiple datasets or subsets of data to analyze.

By following these steps, you can obtain the correlation coefficients for the top ten reasons for loan cancellation and refusal in your dataset using Excel commands.

For my Excel sheet -Pay 99₹ and fill the form.You will get my Excel file link and support from my side.

  • IMBD Movie Analysis-(EXCEL-PROJECT)
  • Operation Analytics and Investigating Metric Spike SQL
  • Analyzing the Impact of Car Features on Price and Profitability (excel project)
  • Hiring Process Analytics(excel project)
  • Essential Excel Functions for Beginners
Bank loan case study:Excel project - (2024)

FAQs

How to prepare a project file for bank loan? ›

Components of Project Report
  1. Introductory page.
  2. Summary of the project.
  3. Scope of the project.
  4. Details of Promoters.
  5. Details of Employees.
  6. Infrastructure Facilities.
  7. Customer Details.
  8. Regional Operations.
May 16, 2024

How to prepare a loan statement in Excel? ›

How to Create a Loan Amortization Schedule in Excel
  1. Step 1: Open a new Spreadsheet and Define Input Cells. ...
  2. Step 2: Create an Amortization Table with Labels. ...
  3. Step 3: Calculate Total Payments (PMT Formulae) ...
  4. Step 4: Calculate interest (IPMT formulae) ...
  5. Step 5: Calculate Principal (PPMT Formulae)
Nov 9, 2023

How to understand project report for bank loan? ›

Components of a Project Report for Bank Loan

Project Description: Include a detailed description of your project, including its purpose, scope, and objectives. Market Analysis: Conduct thorough market research to showcase the demand for your project and its potential competitiveness.

What is included in project report for bank loan? ›

The report should include a detailed project plan, including a timeline, budget, and projected cash flow. It should also include information on the management team and their qualifications, as well as any market research or industry analysis that has been conducted.

What is loan function in Excel? ›

PMT, one of the financial functions, calculates the payment for a loan based on constant payments and a constant interest rate. Use the Excel Formula Coach to figure out a monthly loan payment.

What is the formula for finance in Excel? ›

Formula: =XNPV(discount_rate, cash_flows, dates)

For finance professionals, XNPV is the most useful formula in Excel.

How Excel can be used in finance? ›

Excel is a powerful tool for data analysis, as it allows users to manipulate and analyze large amounts of data quickly and easily. Finance professionals can use Excel to analyze financial data, such as revenue, expenses, and cash flow, as well as to analyze non-financial data, such as customer data or market research.

How do you present a project to a bank? ›

How to Use Project Proposal for Bank Loan
  1. Define your project. Clearly outline the details of your project, including its purpose, goals, timeline, and budget. ...
  2. Conduct market research. ...
  3. Develop a financial plan. ...
  4. Outline project milestones. ...
  5. Craft a compelling executive summary.

How do I submit a project report to a bank? ›

When creating a project report for a bank loan, be sure to include the following key components:
  1. Executive Summary: Provide a brief overview of your project, including its purpose, scope, and expected outcomes.
  2. Project Description: Clearly outline the objectives, activities, and milestones of your project.
Mar 17, 2024

How do you prepare a project for finance? ›

  1. Business Plan/ Pitch Deck.
  2. Project Financial Structure + Legal Structure + Management Structure.
  3. Revenue Model.
  4. Cash Flow Projections for the next five to ten years.
  5. SWOT Analysis of the project.
  6. Risk Analysis & Mitigation details.
  7. SPV/SPC(Special Purpose Company/Vehicle) Registration documents.

How do I start a project file? ›

Click File > New > Blank Project. That gives you a blank canvas to work on.

Top Articles
Latest Posts
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 5883

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.