The Generalized Additive Models for Location, Scale and Shape (GAMLSS) is a recent development which provides a framework with access to a large set of distributions and the ability to model all of the parameters of these distributions as functions of the explanatory variables within a data set. Read the rest of this entry »
Getting started with GAMLSS
January 19th, 2014Book on Time Series Forecasting
May 6th, 2013The online book on time series forecasting methods by Rob Hyndman and George Athanasopoulos has been completed and was announced on the Hyndsight blog. It is a very accessible book and worth reading to understand time series methodology and useful strategies for making predictions using these models.
Seasonal Trend Decomposition in R
January 11th, 2013The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the stl function. Read the rest of this entry »
Graph Types: Pie Charts
October 13th, 2012The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions. Read the rest of this entry »
Graph Design Principles
June 25th, 2012There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte. Read the rest of this entry »
Logistic Regression and Bias Reduction
May 22nd, 2012David Firth published a paper in 1993 on maximum likelihood estimation and the reduction of bias when using this approach. The research in this area appears to provide benefit for logistic regression in small data sets where there is complete of quasi separation. This approach has been implemented for Generalized Linear Models in the brglm package. Read the rest of this entry »
Generalized Linear Models – Poisson Regression
June 26th, 2011The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous distribution, especially as our counts should be bounded below at zero. Negative counts do not make sense. Read the rest of this entry »
Data Mining with WEKA
January 30th, 2011There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. Read the rest of this entry »
Gapminder
January 6th, 2011As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Read the rest of this entry »
Plotting Time Series data using ggplot2
September 30th, 2010There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Read the rest of this entry »