The t-test is regularly used in Classical Statistics to investigate one or two samples of data and to test a particular hypothesis. There are variants on the t-test that are all handled by the same function, t.test, in R. Read the rest of this entry »
One and Two Sample Hypothesis Testing
June 26th, 2009Program Flow – Using Conditional Statements
June 21st, 2009The R environment is based on providing access to the R programming language, which like other programming languages has a variety of constructions that can be used to control the flow of the analysis to make various decisions based on testing conditions. For each, in a function for numerical optimisation we might be interested in specifying a tolerance level to determine when the algorithm should stop or a maximum number of iterations. Read the rest of this entry »
Using Histograms to Summarise Data
June 8th, 2009It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation. Read the rest of this entry »
Plotting Probability Distributions
June 2nd, 2009There are many distributions that are available within the base R Statistical System and it is possibly to use these functions to visualise the density or cumulative density functions for a distribution with a given set of parameters. Read the rest of this entry »
Working with Probability Distributions
May 31st, 2009Probability distributions have a central role in Statistics and the R software has functions to work with a large range of distributions – the syntax has been selected to provide some consistency based on the type of information required about a distribution. Read the rest of this entry »
Vector Calculations to avoid Explicit Loops
May 23rd, 2009The S programming language has facilities for applying a function to all the individual elements of a vector, matrix or data frame which avoid the need to make explicit use of loops. In fact using loops in R is not recommended as this will slow down the calculations, but there will of course be some situations where it is unavoidable. Read the rest of this entry »
Transformations to Create New Variables
May 18th, 2009There are many situations where we might be interested in creating a new variable by transforming one of the variables already in the data frame. The R programming language can be used for either simple transformations or more complicated mathematical expressions where necessary. Read the rest of this entry »
Cross-tabulation of Data
May 15th, 2009The contingency table is used to summarise data when there are factors in the data set and we are interested in counting the number of occurrences of each combination of factor variables. In R there are different ways that these types of table can be produced and manipulated as required. Read the rest of this entry »
Producing Data Summaries
May 11th, 2009The first stage of most investigations is to produce summaries of the data to identify any unusual records and to get a overall feel for the contents of the data. This initial data analysis usually involves tabulation and plotting of data and there are a variety of functions available in R to generate the required summaries of interest. Read the rest of this entry »