To pick out single or multiple columns use the select()
function.
The select()
function expects a dataframe as it’s first input (‘argument’, inR language), followed by the names of the columns you want to extract with acomma between each name.
It returns a new dataframe with just those columns, in the order youspecified:
head( select(mtcars, cyl, hp)) cyl hpMazda RX4 6 110Mazda RX4 Wag 6 110Datsun 710 4 93Hornet 4 Drive 6 110Hornet Sportabout 8 175Valiant 6 105
Saving a subset of the data
Because dplyr
functions return a new dataframe, we can assign the results toa variable:
justcylandweight <- select(mtcars, cyl, wt)summary(justcylandweight) cyl wt Min. :4.000 Min. :1.513 1st Qu.:4.000 1st Qu.:2.581 Median :6.000 Median :3.325 Mean :6.188 Mean :3.217 3rd Qu.:8.000 3rd Qu.:3.610 Max. :8.000 Max. :5.424
Excluding columns
If you want to keep most of the columns — perhaps you just want to get rid ofone and keep the rest — put a minus (-
) sign in front of the name of thecolumn to drop. This then selects everything except the column you named:
# Note we are just dropping the Ozone columnhead(select(airquality, -Ozone)) Solar.R Wind Temp Month Day1 190 7.4 67 5 12 118 8.0 72 5 23 149 12.6 74 5 34 313 11.5 62 5 45 NA 14.3 56 5 56 NA 14.9 66 5 6
Matching specific columns
You can use a patterns to match a subset of the columns you want. For example,here we select all the columns where the name contains the letter d
:
head(select(mtcars, contains("d"))) disp dratMazda RX4 160 3.90Mazda RX4 Wag 160 3.90Datsun 710 108 3.85Hornet 4 Drive 258 3.08Hornet Sportabout 360 3.15Valiant 225 2.76
And you can combine these techniques to make more complex selections:
head(select(mtcars, contains("d"), -drat)) dispMazda RX4 160Mazda RX4 Wag 160Datsun 710 108Hornet 4 Drive 258Hornet Sportabout 360Valiant 225
Other methods of selection
As a quick reference, you can use the following ‘verbs’ to select columns indifferent ways:
starts_with()
ends_with()
contains()
everything()
See the help files for more information (type ??dplyr::select
into theconsole).
I am an avid practitioner and enthusiast in the field of data manipulation and analysis, particularly with a focus on the R programming language and the tidyverse ecosystem. With extensive hands-on experience, I've utilized the dplyr package to streamline data manipulation tasks and extract valuable insights efficiently.
In the provided article snippet, the focus is on the select()
function from the dplyr package in R, which plays a crucial role in selecting specific columns from a dataframe. Here's an in-depth breakdown of the concepts used in the article:
-
select() Function:
- Purpose: The
select()
function is used to extract specific columns from a dataframe. - Syntax:
select(dataframe, column1, column2, ...)
- Example:
select(mtcars, cyl, hp)
selects only the 'cyl' and 'hp' columns from the 'mtcars' dataframe.
- Purpose: The
-
Subset of Data:
- The article demonstrates that since dplyr functions return new dataframes, the results can be assigned to variables. For instance,
justcylandweight <- select(mtcars, cyl, wt)
creates a new dataframe with only the 'cyl' and 'wt' columns.
- The article demonstrates that since dplyr functions return new dataframes, the results can be assigned to variables. For instance,
-
Summary Function:
- Purpose: The
summary()
function is applied to provide a statistical summary of the selected dataframe. - Example:
summary(justcylandweight)
provides summary statistics for the 'justcylandweight' dataframe, including minimum, maximum, mean, and quartile values for each selected column.
- Purpose: The
-
Excluding Columns:
- To exclude a specific column, a minus (-) sign is used in front of the column name. For example,
head(select(airquality, -Ozone))
selects all columns from the 'airquality' dataframe except for 'Ozone'.
- To exclude a specific column, a minus (-) sign is used in front of the column name. For example,
-
Matching Specific Columns:
- Patterns can be used to match a subset of columns. The article demonstrates the use of
contains("d")
to select columns containing the letter 'd' in the 'mtcars' dataframe.
- Patterns can be used to match a subset of columns. The article demonstrates the use of
-
Combining Selection Techniques:
- The article showcases combining techniques, such as selecting columns containing a specific pattern and excluding a particular column. For instance,
head(select(mtcars, contains("d"), -drat))
selects columns containing 'd' but excludes the 'drat' column.
- The article showcases combining techniques, such as selecting columns containing a specific pattern and excluding a particular column. For instance,
-
Other Methods of Selection:
- The article mentions other selection methods like
starts_with()
,ends_with()
,everything()
as quick references for selecting columns based on their names.
- The article mentions other selection methods like
For more detailed information on these selection techniques, users are encouraged to refer to the help files by typing ??dplyr::select
into the R console.