The R programming language has keywords defined to allow the user to defined loops in various ways – as a for, while or repeat statement. These statements can be used to ensure that a section of code is repeated multiple times until a defined condition has been satisfied. Read the rest of this entry »
Program Flow – Loops
September 4th, 2009Box and Whisker Plots for Summarising Data
August 11th, 2009We have considered using a histogram to summarise univariate data but there are other types of plot such as the box and whisker plot that can be used summarised univariate data. The box and whisker plot is a graphical method for summarising numerical data based on a five-number summary. These five numbers are the minimum, lower quartile, median, upper quartile and maximum value. Read the rest of this entry »
Using SQL to Access Data in MySQL Databases
July 28th, 2009In a previous post we have looked at opening and closing connections from R to a MySQL database and some basic operations for creating and deleting tables. In this post we will consider using SQL queries to extract parts of a table from the database based on different search criteria. Read the rest of this entry »
Using R to Access Data in a MySQL database
July 24th, 2009The R import/export manual discusses various approaches to handling data and mentions that R is not suitable for working with large data sets because data objects are stored in memory during a session. There are situations where using a database to hold the data and making use of one of the R libraries for database connectivity to access the data or to save the data. Read the rest of this entry »
One and Two Sample Hypothesis Testing
June 26th, 2009The t-test is regularly used in Classical Statistics to investigate one or two samples of data and to test a particular hypothesis. There are variants on the t-test that are all handled by the same function, t.test, in R. Read the rest of this entry »
Program Flow – Using Conditional Statements
June 21st, 2009The R environment is based on providing access to the R programming language, which like other programming languages has a variety of constructions that can be used to control the flow of the analysis to make various decisions based on testing conditions. For each, in a function for numerical optimisation we might be interested in specifying a tolerance level to determine when the algorithm should stop or a maximum number of iterations. Read the rest of this entry »
Using Histograms to Summarise Data
June 8th, 2009It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation. Read the rest of this entry »
Plotting Probability Distributions
June 2nd, 2009There are many distributions that are available within the base R Statistical System and it is possibly to use these functions to visualise the density or cumulative density functions for a distribution with a given set of parameters. Read the rest of this entry »
Working with Probability Distributions
May 31st, 2009Probability distributions have a central role in Statistics and the R software has functions to work with a large range of distributions – the syntax has been selected to provide some consistency based on the type of information required about a distribution. Read the rest of this entry »