Arrange, Filter, & Group Rows In R Using dplyr (2024)

This tutorial is a continuation on the discussion about the dplyr package. You’ll learn how to arrange, filter, and group rows in R.

The previous lesson was on column operations. This time, the focus will be on row operations in dplyr.

We’ll be covering the basics which includes sorting and filtering a dataset and aggregating and summarizing records. To give you an overview on what to expect for this lesson, think of a pivot table in MS Excel.

Table of Contents

Getting Started

Open a new R script in RStudio.

Similar to the column operations lesson, this demonstration will use the Lahman dataset package. Download it by doing a quick google search.

To bring the Lahman package into R, run library (Lahman). To enable the dplyr package, run library (tidyverse). Also, remember that a best practice for naming conventions in R is using lowercase letters so assign Teams into teams.

Arrange, Filter, & Group Rows In R Using dplyr (1)

Basic Functions For Row Operations

1. Arrange Rows In R

The first row operation in dpylr is arrange ( ). This function allows you to reorder rows. It works by first arranging the data frame df and then the given fields.

Arrange, Filter, & Group Rows In R Using dplyr (2)

For example, let’s sort by teamID. Run arrange (teams, teamID).

Arrange, Filter, & Group Rows In R Using dplyr (3)

If you want them to be arranged in descending order, you need to use the desc ( ) function.

As an example, if you want to sort by year in descending order, run arrange (teams, desc(yearID)).

Arrange, Filter, & Group Rows In R Using dplyr (4)

When you do this, you’re not assigning the output back to teams. You’re just seeing the result in the Console.

It’s also possible to sort by multiple criteria. For example, if you want to sort by teamID and then yearID in descending order, you only need to run this code:

Arrange, Filter, & Group Rows In R Using dplyr (5)

When you’re sorting rows, you’re not changing the data. The data is just being moved around. Nothing is being added or removed.

2. Filter Rows In R

The filter ( ) function adds or removes data depending on the criteria selected. Its basic code is:

Arrange, Filter, & Group Rows In R Using dplyr (6)

As an example, let’s get all the data where the yearID is greater than or equal to 2000. Follow the filter function’s format and input the needed information. Then, run it. Don’t forget to assign this to a new object. In this case, it was assigned to modern.

Arrange, Filter, & Group Rows In R Using dplyr (7)

To check if the rows were indeed filtered, you can use the dim ( ) function. It gives the number of rows and columns in the data frame.

If you run dim (teams), you’ll see that the data frame has 2,955 rows and 48 columns.

Arrange, Filter, & Group Rows In R Using dplyr (8)

If you run the dim function on modern, you’ll see that the number of rows has been reduced to 630 while the number of columns remains the same.

Arrange, Filter, & Group Rows In R Using dplyr (9)

The rows have been truncated because some of the records go beyond the year 2000.

Filter Rows By Multiple Fields

It’s also possible to filter rows by multiple fields in R. You’ll need to use the AND and OR statements.

For example, let’s filter teams by area. In this case, a new object ohio is created. The filter criteria are that teamID should only include Cleveland AND Cincinnati.

Arrange, Filter, & Group Rows In R Using dplyr (10)

You need to use the double equal sign (==) to check equality. If you only use one equal sign, R will consider it as an assignment operator. Use the ampersand (&) to represent AND.

To check, use the dim function. You’ll see that the number of rows is 0.

Arrange, Filter, & Group Rows In R Using dplyr (11)

This means that there aren’t any teams where they’re both based in Cleveland and Cincinnati.

Next, let’s try the Cleveland OR Cincinnati. The OR operator is represented by the pipe operator ( | ). So, all you need to do is replace the ampersand with the pipe operator and then run it. Afterwards, run the dim function again.

Arrange, Filter, & Group Rows In R Using dplyr (12)

You’ll see that there are 251 rows rather than zero.

Now what if you forget to use a double equal sign and instead use just one? Here’s what happens:

Arrange, Filter, & Group Rows In R Using dplyr (13)

RStudio will show a very helpful error message in the Console reminding you to use the double equal sign.

3. Group By And Summarise Rows In R

The group by ( ) function allows you to aggregate records by selected columns and then based on that aggregation, summarise another column.

The group by ( ) function follows this algorithm:

Arrange, Filter, & Group Rows In R Using dplyr (14)

As an example, let’s group by teamID and assign it to a new object. In this case, the new object is called teams_ID. Then, print it.

Arrange, Filter, & Group Rows In R Using dplyr (15)

In the Console, you’ll notice that the first line say it’s a tibble.

Arrange, Filter, & Group Rows In R Using dplyr (16)

A tibble is a tidyverse improvement over the basic data frame. It’s a feature in the package that augments and improves what’s available out of the box.

The second line is Groups. So, the data is now grouped by the teamID column.

Arrange, Filter, & Group Rows In R Using dplyr (17)

And with that, you can now use the summarise ( ) function on those groups.

Arrange, Filter, & Group Rows In R Using dplyr (18)

Note: the summarise function can either be with an s or z, and will depend on the use of British or American English.

For example, let’s summarise teams_ID and get some basic summary statistics. Let’s look for the mean, minimum, and maximum of the Wins for each team. Remember to highlight the entire code before choosing to Run.

Arrange, Filter, & Group Rows In R Using dplyr (19)

You can then see in the Console that a summary of each team’s statistics is displayed. This is very similar to a pivot table where you’re aggregating and summarizing data.

***** Related Links *****
Data Frames In R: Learning The Basics

Conclusion

To recap, two operations in dplyr have been discussed. A previous tutorial focused on column operations. Meanwhile, this current lesson showed you how to perform row operations using the dplyr package in RStudio. Specifically, you learned how to arrange, filter, and group rows in R.

The next thing to learn is how to combine these two operations. Using all the functions you’ve learned so far will greatly assist you in creating codes in R. However, a more helpful technique would be a pipeline. This will help everything flow together. So, make sure to review the next tutorials as well.

George

Arrange, Filter, & Group Rows In R Using dplyr (2024)
Top Articles
Latest Posts
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 5871

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.