Joining of Dataframes in R Programming - GeeksforGeeks (2024)

rreIn R Language, dataframes are generic data objects which are used to store the tabular data. Dataframes are considered to be the most popular data objects in R programming because it is more comfortable to analyze the data in the tabular form. Dataframes can also be taught as mattresses where each column of a matrix can be of the different data types. Dataframe is made up of three principal components, the data, rows, and columns. In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS. Types of Merging Available in R are,

  1. Natural Join or Inner Join
  2. Left Outer Join
  3. Right Outer Join
  4. Full Outer Join
  5. Cross Join
  6. Semi Join
  7. Anti Join

Basic Syntax of merge() function in R:

Syntax: merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE) Parameters: df1: one dataframe df2: another dataframe by.df1, by.df2: The names of the columns that are common to both df1 and df2. all, all.df1, all.df2: Logical values that actually specify the type of merging happens.

Now let’s try to understand all types of merging one by one. First of all, we will create two dataframes that will help us to understand each join easily. # Data frame 1

Python3

df1 = data.frame(StudentId = c(101:106),

Product = c("Hindi", "English",

"Maths", "Science",

"Political Science",

"Physics"))

df1

Output:

 StudentId Product1 101 Hindi2 102 English3 103 Maths4 104 Science5 105 Political Science6 106 Physics

# Data frame 2

Python3

df2 = data.frame(StudentId = c(102, 104, 106,

107, 108),

State = c("Mangalore", "Mysore",

"Pune", "Dehradun", "Delhi"))

df2

Output:

 StudentId State1 102 Mangalore2 104 Mysore3 106 Pune4 107 Dehradun5 108 Delhi

Natural Join or Inner Join

Inner join is used to keep only those rows that are matched from the dataframes, in this, we actually specify the argument all = FALSE. If we try to understand this using set theory then we can say here we are actually performing the intersection operation. For example:

A = [1, 2, 3, 4, 5]B = [2, 3, 5, 6]Then the output of natural join will be (2, 3, 5)

It is the most simplest and common type of joins available in R. Now let us try to understand this using R program:

Example:

Output:

 StudentId Product State1 102 English Mangalore2 104 Science Mysore3 106 Physics Pune

Left Outer Join

Left Outer Join is basically to include all the rows of your dataframe x and only those from y that match, in this, we actually specify the argument x = TRUE. If we try to understand this using a basic set theory then we can say here we are actually displaying complete set x. Now let us try to understand this using R program:

Example:

Python3

# R program to illustrate

# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",

all.x = TRUE)

df

Output:

 StudentId Product State1 101 Hindi NA2 102 English Mangalore3 103 Maths NA4 104 Science Mysore5 105 Political Science NA6 106 Physics Pune

Right Outer Join

Right, Outer Join is basically to include all the rows of your dataframe y and only those from x that match, in this, we actually specify the argument y = TRUE. If we try to understand this using a basic set theory then we can say here we are actually displaying a complete set y. Now let us try to understand this using R program: Example:

Python3

# R program to illustrate

# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",

all.y = TRUE)

df

Output:

 StudentId Product State1 102 English Mangalore2 104 Science Mysore3 106 Physics Pune4 107 NA Dehradun5 108 NA Delhi

Full Outer Join

Outer Join is basically used to keep all rows from both dataframes, in this, we actually specify the arguments all = TRUE. If we try to understand this using a basic set theory then we can say here we are actually performing the union option. Now let us try to understand this using R program:

Example:

Python3

# R program to illustrate

# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",

all = TRUE)

df

Output:

 StudentId Product State1 101 Hindi NA2 102 English Mangalore3 103 Maths NA4 104 Science Mysore5 105 Political Science NA6 106 Physics Pune7 107 NA Dehradun8 108 NA Delhi

Cross Join

A Cross Join also known as cartesian join results in every row of one dataframe is being joined to every other row of another dataframe. In set theory, this type of joins is known as the cartesian product between two sets. Now let us try to understand this using R program:

Example:

Python3

# R program to illustrate

# Joining of dataframes

df = merge(x = df1, y = df2, by = NULL)

df

Output:

StudentId.x Product StudentId.y State1 101 Hindi 102 Mangalore2 102 English 102 Mangalore3 103 Maths 102 Mangalore4 104 Science 102 Mangalore5 105 Political Science 102 Mangalore6 106 Physics 102 Mangalore7 101 Hindi 104 Mysore8 102 English 104 Mysore9 103 Maths 104 Mysore10 104 Science 104 Mysore11 105 Political Science 104 Mysore12 106 Physics 104 Mysore13 101 Hindi 106 Pune14 102 English 106 Pune15 103 Maths 106 Pune16 104 Science 106 Pune17 105 Political Science 106 Pune18 106 Physics 106 Pune19 101 Hindi 107 Dehradun20 102 English 107 Dehradun21 103 Maths 107 Dehradun22 104 Science 107 Dehradun23 105 Political Science 107 Dehradun24 106 Physics 107 Dehradun25 101 Hindi 108 Delhi26 102 English 108 Delhi27 103 Maths 108 Delhi28 104 Science 108 Delhi29 105 Political Science 108 Delhi30 106 Physics 108 Delhi

Semi Join

This join is somewhat like inner join, with only the left dataframe columns and values are selected. Now let us try to understand this using R program:

Example:

Python3

# R program to illustrate

# Joining of dataframes

# Import required library

library(dplyr)

df = df1 %>% semi_join(df2, by = "StudentId")

df

Output:

 StudentId Product1 102 English2 104 Science3 106 Physics

Anti Join

In terms of set theory, we can say anti-join as set difference operation, for example, A = (1, 2, 3, 4) B = (2, 3, 5) then the output of A-B will be set (1, 4). This join is somewhat like df1 – df2, as it basically selects all rows from df1 that are actually not present in df2. Now let us try to understand this using R program:

Example:

Python3

# R program to illustrate

# Joining of dataframes

# Import required library

library(dplyr)

df = df1 %>% anti_join(df2, by = "StudentId")

df

Output:

 StudentId Product1 101 Hindi2 103 Maths3 105 Political Science

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!


Last Updated : 23 May, 2022

Like Article

Save Article

Previous

How to Combine Two Columns into One in R dataframe?

Next

Decision Tree for Regression in R Programming

As an enthusiast deeply entrenched in the world of R programming and data analysis, I bring a wealth of practical knowledge and experience to the table. My journey involves not only mastering the intricacies of R language but also delving into the nuances of data manipulation and merging using dataframes. Let's dive into the key concepts discussed in the article:

1. Dataframes in R:

Dataframes are generic data objects in R used for storing tabular data. They consist of three main components: data, rows, and columns. The tabular format makes it convenient for data analysis, and each column can have different data types.

2. merge() Function:

In R, the merge() function is employed to combine two dataframes. This function is part of the dplyr package and is encapsulated within the join() function. A crucial condition for merging is that the column types used for merging must match.

3. Types of Merging Available in R:

The article covers various types of merging operations:

  • Natural Join or Inner Join:

    • Keep only the rows that are matched between dataframes.
    • Implemented using merge(x = df1, y = df2, by = "StudentId").
  • Left Outer Join:

    • Include all rows from the left (df1) and matched rows from the right (df2).
    • Specified with merge(x = df1, y = df2, by = "StudentId", all.x = TRUE).
  • Right Outer Join:

    • Include all rows from the right (df2) and matched rows from the left (df1).
    • Indicated by merge(x = df1, y = df2, by = "StudentId", all.y = TRUE).
  • Full Outer Join:

    • Keep all rows from both dataframes.
    • Utilized through merge(x = df1, y = df2, by = "StudentId", all = TRUE).
  • Cross Join:

    • Also known as cartesian join, results in every row of one dataframe being joined to every row of another.
    • Achieved by merge(x = df1, y = df2, by = NULL).
  • Semi Join:

    • Similar to an inner join, selecting only columns and values from the left dataframe.
    • Implemented using the semi_join function from the dplyr package.
  • Anti Join:

    • Similar to set difference operation, selecting rows from the left dataframe not present in the right.
    • Utilized through the anti_join function from the dplyr package.

4. Basic Syntax of merge() Function:

The syntax for the merge() function is outlined as follows:

merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)

Parameters:

  • df1 and df2: Dataframes to be merged.
  • by.df1 and by.df2: Names of common columns.
  • all, all.df1, all.df2: Logical values specifying the type of merging.

Understanding these concepts equips you with the tools to effectively merge and analyze data using R programming, a crucial skill in the realm of data science and analysis.

Joining of Dataframes in R Programming - GeeksforGeeks (2024)
Top Articles
Latest Posts
Article information

Author: Msgr. Refugio Daniel

Last Updated:

Views: 6099

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Msgr. Refugio Daniel

Birthday: 1999-09-15

Address: 8416 Beatty Center, Derekfort, VA 72092-0500

Phone: +6838967160603

Job: Mining Executive

Hobby: Woodworking, Knitting, Fishing, Coffee roasting, Kayaking, Horseback riding, Kite flying

Introduction: My name is Msgr. Refugio Daniel, I am a fine, precious, encouraging, calm, glamorous, vivacious, friendly person who loves writing and wants to share my knowledge and understanding with you.