There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R. Read the rest of this entry »
Melt
April 5th, 2012Programming with R – Processing Football League Data Part II
December 3rd, 2010Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date. Read the rest of this entry »
Useful functions for data frames
August 9th, 2010The R software system is primarily command line based so when there are large sets of data it is not easy to browse the data frames. There are various useful functions for working with data frames. Read the rest of this entry »
Creating Date Objects using Character Strings
September 10th, 2009The use of dates can frequently be problematic because there is such a wide range of format used to store data information. The R system has various facilities for defining and working with dates and can handle a wide range of formats that might be encountered in a set of data. Read the rest of this entry »
Using R to Access Data in a MySQL database
July 24th, 2009The R import/export manual discusses various approaches to handling data and mentions that R is not suitable for working with large data sets because data objects are stored in memory during a session. There are situations where using a database to hold the data and making use of one of the R libraries for database connectivity to access the data or to save the data. Read the rest of this entry »
Transformations to Create New Variables
May 18th, 2009There are many situations where we might be interested in creating a new variable by transforming one of the variables already in the data frame. The R programming language can be used for either simple transformations or more complicated mathematical expressions where necessary. Read the rest of this entry »
Cross-tabulation of Data
May 15th, 2009The contingency table is used to summarise data when there are factors in the data set and we are interested in counting the number of occurrences of each combination of factor variables. In R there are different ways that these types of table can be produced and manipulated as required. Read the rest of this entry »
Working with Subsets of Data
May 8th, 2009There are often situations where we might be interested in a subset of our complete data and there are simple mechanisms for viewing and editing particular subsets of a data frame or other objects in R. Read the rest of this entry »