A 3d wireframe plot is a type of graph that is used to display a surface – geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which are the two dimensional equivalent. Read the rest of this entry »
Variable selection using automatic methods
May 22nd, 2010When we have a set of data with a small number of variables we can easily use a manual approach to identifying a good set of variables and the form they take in our statistical model. In other situations we may have a large number of potentially important variables and it soon becomes a time consuming effort to follow a manual variable selection process. In this case we may consider using automatic subset selection tools to remove some of the burden of the task. Read the rest of this entry »
Linear regression models with robust parameter estimation
May 15th, 2010There are situations in regression modelling where robust methods could be considered to handle unusual observations that do not follow the general trend of the data set. There are various packages in R that provide robust statistical methods which are summarised on the CRAN Robust Task View. Read the rest of this entry »
Manual variable selection using the dropterm function
May 12th, 2010When fitting a multiple linear regression model to data a natural question is whether a model can be simplified by excluding variables from the model. There are automatic procedures for undertaking these tests but some people prefer to follow a more manual approach to variable selection rather than pressing a button and taking what comes out. Read the rest of this entry »
Using the update function during variable selection
May 9th, 2010When fitting statistical models to data where there are multiple variables we are often interested in adding or removing terms from our model and in cases where there are a large number of terms it can be quicker to use the update function to start with a formula from a model that we have already fitted and to specify the terms that we want to add or remove as opposed to a copy and paste and manually editing the formula to our needs. Read the rest of this entry »
Analysis of Covariance – Extending Simple Linear Regression
April 28th, 2010The simple linear regression model considers the relationship between two variables and in many cases more information will be available that can be used to extend the model. For example, there might be a categorical variable (sometimes known as a covariate) that can be used to divide the data set to fit a separate linear regression to each of the subsets. We will consider how to handle this extension using one of the data sets available within the R software package. Read the rest of this entry »
Summarising data using box and whisker plots
April 25th, 2010A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) and the minimum and maximum values. Read the rest of this entry »
Simple Linear Regression
April 23rd, 2010One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are flexible facilities in R for fitting a range of linear models from the simple case of a single variable to more complex relationships. Read the rest of this entry »
R and Tolerance Intervals
April 19th, 2010Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance intervals of interest. Read the rest of this entry »