There are often situations where we might be interested in a subset of our complete data and there are simple mechanisms for viewing and editing particular subsets of a data frame or other objects in R.
We might be interested in using one of the variables to select a particular subset. A square bracket notation is used after the name of an object to indicate that we are interested in specific rows or columns of the data and there are a large number of options that could be used. For example, if we consider the olive oil data set used in ggobi demonstrations, we could view the data for one of the regions using the following code:
olive.df[olive.df$Area == "North-Apulia",] |
which would give the following output:
Region Area palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic 1 1 North-Apulia 1075 75 226 7823 672 36 60 29 2 1 North-Apulia 1088 73 224 7709 781 31 61 29 3 1 North-Apulia 911 54 246 8113 549 31 63 29 ... |
Within the square brackets in this example we test a condition which runs a vector of TRUE and FALSE values and R interprets our intention as viewing only those rows where the condition returned TRUE. The comma is used to separate between rows and columns in this case as we have two dimensions in the data frame. All columns are included as there is no expression after the comma in the example above.
It is possible to work with multiple conditions so for the olive oil data we could select one of the other regions, South Apulia, and only data points where stearic variable is greater than 250 units. The could used in this case would be:
olive.df[olive.df$Area == "South-Apulia" & olive.df$stearic > 250,] |
to give output:
Region Area palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic 88 1 South-Apulia 1410 232 280 6715 1233 32 60 24 89 1 South-Apulia 1509 209 257 6647 1240 42 62 30 90 1 South-Apulia 1317 197 256 7036 1067 40 60 22 ... |
If we were interested in a particular column of data then we would specify the name of the column(s) after the comma in the square brackets. For example we could view the palmitic column only with this code:
olive.df[,"palmitic"] |
Multiple columns could be selected by providing a vector of column names, such as:
olive.df[,c("palmitic","oleic")] |
More complicated expressions are possible with a bit of imagination. For example if we wanted to view the even numbered rows only then we could use the condition seq(2,10,2) which would provide the numbers 2, 4, 6, 8 and 10.