bcss
home
Outline content for the workshop

Data Exploration and Presentation with R

## Kuala Lumpur, 12-15 March 2018

Checking and organising the data are necessary steps before it can be loaded into any statistical package, and this is true for R too. Becoming thoroughly familiar with the data is the first step in any modelling process. For even moderately large data sets, graphs are the only way to grasp the meaning of the data. Graphs also allow you to discuss the data with colleagues and to present them in reports and papers.

Here we discuss the process from recording in the field to ready-to-use data in R and the presentation of the data and results using charts.

## Outline

The first stage is to check and format the data so that they can be used in R:

• Collecting field data with data sheets or handheld devices, recording metadata.
• Transcribing or uploading data to the computer, while checking the raw data for missing values and doubtful or implausible entries.
• Use of spreadsheet tools to check the data. Saving the data in a standard format (CSV or TSV). Incorporating metadata.
• Loading the data into R, using R tools to check the data; often means going back to the spreadsheet to make the necessary corrections.

Once we have a usable data set in R, we can start to explore:

• Indications of data quality: rounding, heaping, differences among observers, change with time of day.
• Summary statistics for each variable.
• Simple plots: scatter plots with trend line, pairs plots (scatter plot matrix), histograms, dot-plots or boxplots or beeswarm plots, time series plots with trend lines or seasonal decomposition, bar charts, ...
• Reformatting data: changing wide to long format (or vice-versa), creating matrices or arrays for further analysis.
• Trying simple models such as regression and examining residuals.

Basic R plots are usually all you need for your own exploration - though you will often need different symbols or colours to distinguish groups within the data. When you need to show your colleagues the data, or present them in a report or a paper, it's important that plots are clear and titles are informative.

• How the reader perceives the elements of a graph and interprets these as quantities.
• Chart junk and the perils of prettification; charts in Excel.
• Customising basic functions: titles, axis labels, plotting symbols, line width and type, colours, axes.
• Using colour in plots; colours for the colour-blind.
• Meeting editors demands for plot formatting.
• Saving plots in pdf and graphics formats.

Before the end of the workshop we will explore topics such as 3D plots (contour plots, wireframe plots), animated graphics, or other topics requested by participants.

Participants are encouraged to bring their own data sets OR results of their analysis to the workshop and to experiment with different plotting options.

Please note that this is NOT a statistics workshop; if you want a stats workshop come to a Boot Camp! We will be focusing here on DATA exploration and graphics and will not discuss analysis.

All participants should come with a laptop computer with R and a spreadsheet package installed.