The task of reading data into a statistical software package is not always a straight forward task and there are many varied file formats that are in use by different software systems. Text files are popular for sharing small or medium sized data sets, while full blown relational databases are more appropriate for larger data sets. The R Environment has functions that handle importing data that is stored in text format and it is also possible to interact with external database systems.
Fast Tube by Casper
When working with text files for storing data there are a number of common issues that need to be considered. A special character is used to distinguish between the columns of data, e.g. a comma or a tab. There is an optional first line that provides the names of the columns (variables) or the column names can be specified explicitly when importing the data. Missing values often cause problems when handling data and a special character can be specified to indicate missing data.
The comma separated variable text file format is straightforward to handle using R with the function read.csv where we specify the file name as our source of data. A simple example of using this function would be:
read.csv("Data\\exampledata1.csv") |
The data is loaded and assuming there are no errors converted into a data frame that can be saved and subsequently analysed. To save the data as an object we could have run this code:
data1 = read.csv("Data\\exampledata1.csv") |
This function makes use of the more general purpose function read.table which accepts a wider range of options to define the delimited text file that is imported into the R environment.
The first argument supplied to this function is also the file name and, if no other options are specified, the default is assumption is that a tab separates data in different columns and that the first line of the text file does not contain column name information. The header argument to this function can be set to TRUE to use the information in the first row as column names. An example of specifying this option is:
read.table("Data\\exampledata2.txt", header = TRUE) |
If the data file does not have a row of column headings then the col.names argument can be used to specify the names that should be given to these columns. An example of using this option:
read.table("Data\\exampledata3.txt", col.names = c("Weight", "Group")) |
The special character used to separate the data on each row can also be specified by the user via the sep argument to the function. An example of importing a data file where a semi-colon is used is shown here:
read.table("Data\\exampledata4.txt", header = TRUE, sep = ";") |