Selecting columns | Just Enough R (2024)

To pick out single or multiple columns use the select() function.

The select() function expects a dataframe as it’s first input (‘argument’, inR language), followed by the names of the columns you want to extract with acomma between each name.

It returns a new dataframe with just those columns, in the order youspecified:

head( select(mtcars, cyl, hp)) cyl hpMazda RX4 6 110Mazda RX4 Wag 6 110Datsun 710 4 93Hornet 4 Drive 6 110Hornet Sportabout 8 175Valiant 6 105

Saving a subset of the data

Because dplyr functions return a new dataframe, we can assign the results toa variable:

justcylandweight <- select(mtcars, cyl, wt)summary(justcylandweight) cyl wt Min. :4.000 Min. :1.513 1st Qu.:4.000 1st Qu.:2.581 Median :6.000 Median :3.325 Mean :6.188 Mean :3.217 3rd Qu.:8.000 3rd Qu.:3.610 Max. :8.000 Max. :5.424 

Excluding columns

If you want to keep most of the columns — perhaps you just want to get rid ofone and keep the rest — put a minus (-) sign in front of the name of thecolumn to drop. This then selects everything except the column you named:

# Note we are just dropping the Ozone columnhead(select(airquality, -Ozone)) Solar.R Wind Temp Month Day1 190 7.4 67 5 12 118 8.0 72 5 23 149 12.6 74 5 34 313 11.5 62 5 45 NA 14.3 56 5 56 NA 14.9 66 5 6

Matching specific columns

You can use a patterns to match a subset of the columns you want. For example,here we select all the columns where the name contains the letter d:

head(select(mtcars, contains("d"))) disp dratMazda RX4 160 3.90Mazda RX4 Wag 160 3.90Datsun 710 108 3.85Hornet 4 Drive 258 3.08Hornet Sportabout 360 3.15Valiant 225 2.76

And you can combine these techniques to make more complex selections:

head(select(mtcars, contains("d"), -drat)) dispMazda RX4 160Mazda RX4 Wag 160Datsun 710 108Hornet 4 Drive 258Hornet Sportabout 360Valiant 225

Other methods of selection

As a quick reference, you can use the following ‘verbs’ to select columns indifferent ways:

  • starts_with()
  • ends_with()
  • contains()
  • everything()

See the help files for more information (type ??dplyr::select into theconsole).

I am an avid practitioner and enthusiast in the field of data manipulation and analysis, particularly with a focus on the R programming language and the tidyverse ecosystem. With extensive hands-on experience, I've utilized the dplyr package to streamline data manipulation tasks and extract valuable insights efficiently.

In the provided article snippet, the focus is on the select() function from the dplyr package in R, which plays a crucial role in selecting specific columns from a dataframe. Here's an in-depth breakdown of the concepts used in the article:

  1. select() Function:

    • Purpose: The select() function is used to extract specific columns from a dataframe.
    • Syntax: select(dataframe, column1, column2, ...)
    • Example: select(mtcars, cyl, hp) selects only the 'cyl' and 'hp' columns from the 'mtcars' dataframe.
  2. Subset of Data:

    • The article demonstrates that since dplyr functions return new dataframes, the results can be assigned to variables. For instance, justcylandweight <- select(mtcars, cyl, wt) creates a new dataframe with only the 'cyl' and 'wt' columns.
  3. Summary Function:

    • Purpose: The summary() function is applied to provide a statistical summary of the selected dataframe.
    • Example: summary(justcylandweight) provides summary statistics for the 'justcylandweight' dataframe, including minimum, maximum, mean, and quartile values for each selected column.
  4. Excluding Columns:

    • To exclude a specific column, a minus (-) sign is used in front of the column name. For example, head(select(airquality, -Ozone)) selects all columns from the 'airquality' dataframe except for 'Ozone'.
  5. Matching Specific Columns:

    • Patterns can be used to match a subset of columns. The article demonstrates the use of contains("d") to select columns containing the letter 'd' in the 'mtcars' dataframe.
  6. Combining Selection Techniques:

    • The article showcases combining techniques, such as selecting columns containing a specific pattern and excluding a particular column. For instance, head(select(mtcars, contains("d"), -drat)) selects columns containing 'd' but excludes the 'drat' column.
  7. Other Methods of Selection:

    • The article mentions other selection methods like starts_with(), ends_with(), everything() as quick references for selecting columns based on their names.

For more detailed information on these selection techniques, users are encouraged to refer to the help files by typing ??dplyr::select into the R console.

Selecting columns | Just Enough R (2024)
Top Articles
Latest Posts
Article information

Author: Foster Heidenreich CPA

Last Updated:

Views: 6367

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Foster Heidenreich CPA

Birthday: 1995-01-14

Address: 55021 Usha Garden, North Larisa, DE 19209

Phone: +6812240846623

Job: Corporate Healthcare Strategist

Hobby: Singing, Listening to music, Rafting, LARPing, Gardening, Quilting, Rappelling

Introduction: My name is Foster Heidenreich CPA, I am a delightful, quaint, glorious, quaint, faithful, enchanting, fine person who loves writing and wants to share my knowledge and understanding with you.