Program Flow – Loops

September 4th, 2009

The R programming language has keywords defined to allow the user to defined loops in various ways – as a for, while or repeat statement. These statements can be used to ensure that a section of code is repeated multiple times until a defined condition has been satisfied. Read the rest of this entry »

Box and Whisker Plots for Summarising Data

August 11th, 2009

We have considered using a histogram to summarise univariate data but there are other types of plot such as the box and whisker plot that can be used summarised univariate data. The box and whisker plot is a graphical method for summarising numerical data based on a five-number summary. These five numbers are the minimum, lower quartile, median, upper quartile and maximum value. Read the rest of this entry »

Using SQL to Access Data in MySQL Databases

July 28th, 2009

In a previous post we have looked at opening and closing connections from R to a MySQL database and some basic operations for creating and deleting tables. In this post we will consider using SQL queries to extract parts of a table from the database based on different search criteria. Read the rest of this entry »

Using R to Access Data in a MySQL database

July 24th, 2009

The R import/export manual discusses various approaches to handling data and mentions that R is not suitable for working with large data sets because data objects are stored in memory during a session. There are situations where using a database to hold the data and making use of one of the R libraries for database connectivity to access the data or to save the data. Read the rest of this entry »

One and Two Sample Hypothesis Testing

June 26th, 2009

The t-test is regularly used in Classical Statistics to investigate one or two samples of data and to test a particular hypothesis. There are variants on the t-test that are all handled by the same function, t.test, in R. Read the rest of this entry »

Program Flow – Using Conditional Statements

June 21st, 2009

The R environment is based on providing access to the R programming language, which like other programming languages has a variety of constructions that can be used to control the flow of the analysis to make various decisions based on testing conditions. For each, in a function for numerical optimisation we might be interested in specifying a tolerance level to determine when the algorithm should stop or a maximum number of iterations. Read the rest of this entry »

Using Histograms to Summarise Data

June 8th, 2009

It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation. Read the rest of this entry »

Plotting Probability Distributions

June 2nd, 2009

There are many distributions that are available within the base R Statistical System and it is possibly to use these functions to visualise the density or cumulative density functions for a distribution with a given set of parameters. Read the rest of this entry »

Working with Probability Distributions

May 31st, 2009

Probability distributions have a central role in Statistics and the R software has functions to work with a large range of distributions – the syntax has been selected to provide some consistency based on the type of information required about a distribution. Read the rest of this entry »

Sequences and Other Regular Arrangements of Data

May 26th, 2009

In Statistical analysis there are frequently situations where regular structures occur, such as in designed experiments, and R has facilities for generating data frames in a simple way. Read the rest of this entry »