The S programming language has facilities for applying a function to all the individual elements of a vector, matrix or data frame which avoid the need to make explicit use of loops. In fact using loops in R is not recommended as this will slow down the calculations, but there will of course be some situations where it is unavoidable.
There is a function called apply that can be used to run a specific function on each of the rows or columns individually. For example we could calculate row or column means or variances using the apply or we could define a more complicated function that is more appropriate for the statistics that we want to calculate. If we take a look at the Olive oil data used in some of the other posts we might be interested in calculating variable (columns in this case) means and we would use this code:
apply(olive.df[,c("palmitic", "palmitoleic", "stearic", "oleic", "linoleic", "linolenic", "arachidic", "eicosenoic")], 2, mean) |
The first thing we do is indicate which columns that we are interested in as the Region and Area are not important for these mean calculations – the square brackets are used to specify a subset of our data frame and we provide a vector of column names after the comma. The output from this function call is:
palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic 1231.74126 126.09441 228.86538 7311.74825 980.52797 31.88811 58.09790 16.28147 |
We could quite easily adjust this function call to use a different function on the data. Let’s say that we are interested in the maximum values for each variable then we would replace mean with max:
apply(olive.df[,c("palmitic", "palmitoleic", "stearic", "oleic", "linoleic", "linolenic", "arachidic", "eicosenoic")], 2, max) |
which returns:
palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic 1753 280 375 8410 1470 74 105 58 |
There are other associated functions – tapply, lapply and sapply – that perform on a similar routine on different types and format of data which will be discussed in subsequent posts.