R select() Function from dplyr - Usage with Examples - Spark By {Examples} (2024)

select() is a function from the dplyr R package that is used to select data frame variables by name, by index, and also is used to rename variables while selecting, and dropping variables by name. In this article, I will explain the syntax of the select() function, and its usage with examples like selecting specific variables by name, by position, selecting variables from the list of names, and many more. Note that in R columns are referred to as variables and rows are referred to as observations.

dplyris an R packagethat provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. To use this, you have to install usinginstall.packages('dplyr')and load it usinglibrary(dplyr).

Sometimes you may need to change the variable names, if so read rename data frame columns in r.

1. dplyr select() Syntax

Following is the syntax of the select() function of the dplyr package in R. This returns an object of the same class asx (input object).

# Syntax of select()select(x, variables_to_select)

Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.

# Create DataFramedf <- data.frame( id = c(10,11,12,13), name = c('sai','ram','deepika','sahithi'), gender = c('M','M','F','F'), dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')), state = c('CA','NY','DE',NA), row.names=c('r1','r2','r3','r4'))df

Yields below output.

R select() Function from dplyr - Usage with Examples - Spark By {Examples} (1)

2. Select Variables by Index Position

To select columns of the R data frame you can use the %>% operator and select() function of the dplyr package. %>% operator is the pipe operator, which is used to implement multiple operations sequentially. When we usedplyrpackage, we mostly use the Infix operator%>%frommagrittr.

In this case, it takes the data frame df and loads it into the select() function from dplyr. Then the select() function selects specific columns.

  • df %>% select(2,3) this code returns the data frame with 2 and 3 columns. Remember that in R index starts from 1.
  • df %>% select(c(2,3)) In this code you can pass the list of column indexes(specified with vector) into the select() function it, will return the corresponding columns of passed indexes.
  • df %>% select(2:3) This code returns the columns of the specified range.

Let’s pass the specified column indexes into this function and get the corresponding columns of the passed indexes.

# Load dplyr library('dplyr')# Select columnsdf %>% select(2,3)# Select columns by list of index or positiondf %>% select(c(2,3))# Select columns by index rangedf %>% select(2:3)

Yields below output

R select() Function from dplyr - Usage with Examples - Spark By {Examples} (2)

3. Select Variables by Name

You can also select variables by name, select multiple variables, and all variables in the list (contains in the list). The first example from the following selects the specified variables that are supplied to the select() function with a comma separator. The second example selects all variables from the list.

# Select columns by label name & genderdf %>% select('name','gender')df %>% select(c('name','gender'))

4. Drop Variables

By using select() you can also drop columns from the DataFrame by Name. To drop variables, use- them along with the variables. Not that it just returns a new DataFrame without the specified variables.

# Select columns except name & genderdf %>% select(-c('name','gender'))

5. Select All Variables Between 2 Variables

You can also select all variables between two variables, to do so use the range operator (:). The left-hand side of the operator is the starting position and the right-hand side is the end position. The following examples select all variables between name and state variables.

# Select columns between name and statedf %>% select('name':'state')

6. Select All Variables that start with

Use starts_with() along with the select() to get all variables start with a character string. The following example selects all variables that start with the gen string.

# Select columns starts with a stringdf %>% select(starts_with('gen'))

7. Select All Variables that end with

Use ends_with() along with the select() to get all variables end with a character string. The following example selects all variables that end with the e string.

# Select columns that ends with a stringdf %>% select(ends_with('e'))

8. Select Variables containing character

In case you want to select all variables that contain a character or string use contains(). The following example selects all variables that contain a character a.

# Select columns that containsdf %>% select(contains('a'))

9. Select All Numeric Variables

Selecting all numeric variables is one of the most used operations. If you have a data frame with variables with strings and integers, performing certain statistical operations on the entire data frame results in error hence, first you need to select all numeric columns and perform the operation on the result of it.

# Select all numeric columnsdf %>% select_if(is.numeric)

10. Complete Example

# Create DataFramedf <- data.frame( id = c(10,11,12,13), name = c('sai','ram','deepika','sahithi'), gender = c('M','M','F','F'), dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')), state = c('CA','NY','DE',NA), row.names=c('r1','r2','r3','r4'))df# Load dplyr library('dplyr')# Select columns by list of index or positiondf %>% select(c(2,3))# Select columns by index rangedf %>% select(2:3)# Select columns by label name & genderdf %>% select(c('name','gender'))df %>% select('name','gender')# Select columns except name & genderdf %>% select(-c('name','gender'))# Select columns between name and statedf %>% select('name':'state')# Select columns starts with a stringdf %>% select(starts_with('gen'))# Select columns not start with a stringdf %>% select(-starts_with('gen'))# Select columns that ends with a stringdf %>% select(ends_with('e'))# Select columns that containsdf %>% select(contains('a'))# Select all numeric columnsdf %>% select_if(is.numeric)

11. Conclusion

In this article, you have learned the select() method syntax from the dplyr package, how to select the variables by index position and name, select variables start with, end with e.t.c

Related Articles

  • R subset() Function
  • R Select All Columns Except Column
  • R filter() Function
  • RFilter DataFrame by Column Value
  • How to Import Text File as a String in R
  • How to Read Text File to DataFrame in R
  • How to Read CSV From URL in R
  • How to Read Multiple CSV Files in R
  • How to Read CSV Files in R
  • How to Export CSV in R Using write.csv()
  • How to Export Excel files in R
  • How to join Data Frames in R
  • How to select columns in R
  • R dplyr rename() Function
  • R dplyr distinct() function

References

R select() Function from dplyr - Usage with Examples - Spark By {Examples} (2024)
Top Articles
Latest Posts
Article information

Author: Manual Maggio

Last Updated:

Views: 5877

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.