A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is used in many common situations and can convey a lot of useful information. Read the rest of this entry »
Summarising data using scatter plots
April 18th, 2010Working with themes in Lattice Graphics
April 12th, 2010The Trellis graphics approach provides facilities for creating effective graphs with a consistent look and feel and one of the good things about the system is the use of themes to define the colour, size and other features of the components that make up a graph. The lattice package in R is an implementation of the approach and in this post we will consider how to change the default settings. Read the rest of this entry »
Summarising data using histograms
April 11th, 2010The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of data points in each region is shown instead of counts. Read the rest of this entry »
Summarising data using dot plots
March 26th, 2010A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where there are multiple symbols used to denote say different years. Read the rest of this entry »
Contingency Tables – Fisher’s Exact Test
March 6th, 2010A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests. Read the rest of this entry »
Two-way Analysis of Variance (ANOVA)
February 15th, 2010The analysis of variance (ANOVA) model can be extended from making a comparison between multiple groups to take into account additional factors in an experiment. The simplest extension is from one-way to two-way ANOVA where a second factor is included in the model as well as a potential interaction between the two factors. Read the rest of this entry »
One-way ANOVA (cont.)
February 12th, 2010In a previous post we considered using R to fit one-way ANOVA models to data. In this post we consider a few additional ways that we can look at the analysis. Read the rest of this entry »
One-way Analysis of Variance (ANOVA)
February 3rd, 2010Analysis of Variance (ANOVA) is a commonly used statistical technique for investigating data by comparing the means of subsets of the data. The base case is the one-way ANOVA which is an extension of two-sample t test for independent groups covering situations where there are more than two groups being compared. Read the rest of this entry »
Graph Examples from Visualizing Data by William Cleveland
November 12th, 2009The trellis graphics approach was pioneered by various statistical researchers and the ideas are used extensively in the book “Visualizing Data” by William Cleveland. There are various resources on the website for trellis graphics including S code for creating the majority of the graphs that appear in the book. Inspired by efforts on the Learning R blog to recreate the examples from Deepayan Sarkar’s book on lattice using ggplot2 I have decide to undertake a similar exercise based on the scripts that have been made available for creating the graphs from the book. Read the rest of this entry »