The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the stl function. Read the rest of this entry »
Seasonal Trend Decomposition in R
January 11th, 2013Programming with R – Processing Football League Data Part II
December 3rd, 2010Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date. Read the rest of this entry »
Charting the performance of cricket all-rounders – IT Botham
August 16th, 2010Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats. Read the rest of this entry »
R and Tolerance Intervals
April 19th, 2010Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance intervals of interest. Read the rest of this entry »
Box and Whisker Plots for Summarising Data
August 11th, 2009We have considered using a histogram to summarise univariate data but there are other types of plot such as the box and whisker plot that can be used summarised univariate data. The box and whisker plot is a graphical method for summarising numerical data based on a five-number summary. These five numbers are the minimum, lower quartile, median, upper quartile and maximum value. Read the rest of this entry »
Using Histograms to Summarise Data
June 8th, 2009It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation. Read the rest of this entry »
Working with Probability Distributions
May 31st, 2009Probability distributions have a central role in Statistics and the R software has functions to work with a large range of distributions – the syntax has been selected to provide some consistency based on the type of information required about a distribution. Read the rest of this entry »
Vector Calculations to avoid Explicit Loops
May 23rd, 2009The S programming language has facilities for applying a function to all the individual elements of a vector, matrix or data frame which avoid the need to make explicit use of loops. In fact using loops in R is not recommended as this will slow down the calculations, but there will of course be some situations where it is unavoidable. Read the rest of this entry »
Cross-tabulation of Data
May 15th, 2009The contingency table is used to summarise data when there are factors in the data set and we are interested in counting the number of occurrences of each combination of factor variables. In R there are different ways that these types of table can be produced and manipulated as required. Read the rest of this entry »