In Statistical analysis there are frequently situations where regular structures occur, such as in designed experiments, and R has facilities for generating data frames in a simple way.
The function expand.grid can be used to create a design by specifying a series of factors and the levels for these factors. A data frame with all the combinations of the factors levels will be created. For example, if we had a two factor experiment where the first factor had four levels labelled A, B, C and D and the second factor had three levels labelled I, II, and III then we could create the data frame for the design using this code:
expand.grid(Factor1 = c("A", "B", "C", "D"), Factor2 = c("I", "II", "III")) |
which would produce the following output:
Factor1 Factor2 1 A I 2 B I 3 C I 4 D I 5 A II 6 B II 7 C II 8 D II 9 A III 10 B III 11 C III 12 D III |
It is also possible to create various other sequences using the seq and rep commands. To create the numbers from 1 to 10 we could run this code:
> 1:10 [1] 1 2 3 4 5 6 7 8 9 10 |
Alternatively the seq function provides greater control over start and end values and the step between each variable. A couple of examples are shown below:
> seq(1, 5) [1] 1 2 3 4 5 > seq(10, 1, -2) [1] 10 8 6 4 2 |
The negative step indicates that the sequence is decreasing.
Another common pattern is where we might want to repeat a number a given number of times. To get ten replicates of the number one we use the rep function:
> rep(1, 10) [1] 1 1 1 1 1 1 1 1 1 1 |
If we want to repeat a sequence multiple times then we provide the sequence as the first argument to the rep function. So to repeat the numbers one to ten twice we would write:
> rep(1:10, 2) [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 |
So the 1:10 evaluates first to the numbers one to ten and the whole thing is repeated twice. A further arrangement where we might want to repeat each element of the sequence a given number of times is accessed by nesting a rep call inside a rep function. The second argument becomes a vector of the same length as the first argument. As an example:
> rep(1:5, rep(3, 5)) [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 |