Experimental Design & Data Analysis – Basic course
This six-day course is an intensive course condensed from a full-semester course taught in the University of Canterbury, New Zealand. It is based on the open source program R, which is introduced to course participants at the beginning of the course and used throughout.
Module I: Scientific methodology and experimental design (~1.5 days)
This module starts with a review of the basics of scientific methodology and statistical concepts, including the deductive reasoning and test statistics used in classic hypothesis testing. Examples of simple regression and ANOVA are worked by hand to facilitate through understanding of the fundamentals of the typical ecological questions. Students then learn to how to do regression and ANOVA models in R.
Experimental design is introduced to the students in a structured manner, starting from the concepts of treatment, control, replication and randomization. Different types of single factor designs are then covered. Students then move on to nested design for testing multiple factors operating at hierarchical spatial or temporal scales. Variance components (fixed and random effects) are then introduced. Examples for every type of design are provided and worked by the students in corresponding practicals.
Module II: Parametric models and non-parametric models (~1 day)
Assumptions of parametric models are reviewed, with the emphasis on what can cause each assumption to be broken, what happens if it is broken and how to check it with model checking tools in R. Suitable data transformations are then introduced to deal with “problematic” data in parametric models. Non-parametric rank tests are then introduced and their limitations discussed.
Module III: General linear models, model fit and simplification (~1.5 days)
At this stage the students will be relatively familiar with the simple regression and ANOVA models in R. They then learn that both belong to a group of models called the general linear models and how to deal with both continuous and categorical predictors in the same model (ANCOVA). Their understanding of the underlying mathematics is reinforce through learning to interpret the model outputs and work out the model predictions by hand.
Assumptions of parametric and linear models are revisited at this point as the students learn how to assess the validity of model assumptions and model fit. They then learn what defines the most parsimonious model based on the Akaike’s Information Criterion (AIC). Students will be guided through the steps of model simplification step by step using a few examples and given adequate exercises during practicals.
Module IV: Generalised linear models (~2 days)
When the underlying error distribution of the data does meet the assumptions of linear models, is there a better way than to resort to data transformation or non-parametric models? This question is discussed in detail and the advantages of generalized linear models are introduced. The underlying mathematics of the generalized linear models is briefly covered to avoid overly technical. These include the concepts of different families of error distribution, the linear predictor and the link function. Examples of the common types of generalized linear models are then given with different families of error distribution. These mainly include modeling for count data, binary data and proportion data, which are often encountered in ecological studies.
On the last day of the course, a few hours are set aside for the students to discuss their own data and how they think they can analyze these data using the modeling tools in R. This may be done in voluntary and relatively casual presentations. Interactions and discussions between the presenters and the audience are highly encouraged.
Modus operandi
The course is taught by two post-doctoral level instructors, experienced in ecological field studies and data analysis in R. One or two teaching assistants will be with the class to provide one-on-one help for the students, especially during the practicals. The pace of the lessons is carefully monitored and adequate Q&A time is provided so that after each day, most participants feel reasonably stretched but not overwhelmed.
Sample schedule for stats course (adapted from 2011 course in Bangkok, Thailand)
Day 1
08:30 – 09:30 Introduction to the course, hypothesis testing using T-test as an example.
09:30 – 10:30 ANOVA and regression
10:30 – 11:00 – Break –
11:00 – 12:00 Practical: ANOVA and regression in Excel
12:00 – 13:00 – Lunch Break –
13:00 – 14:00 Practical: ANOVA and regression in Excel ctd., revision/questions if necess.
14:00 – 15:00 Multifactor analysis (multiple regression, multifactor ANOVA)
15:00 – 15:30 – Break –
15:30 – 17:30 Practical: Introduction to R. ANOVA and regression in R.
Day 2
08:30 – 09:30 Assumptions of parametric tests
09:30 – 10:30 Data transformations
10:30 – 11:00 – Break –
11:00 – 12:00 Non-parametric tests
12:00 – 13:00 – Lunch Break –
13:00 – 15:00 Practical: Non-parametric rank tests in R
15:00 – 15:30 – Break –
15:30 – 17:30 Practical: Graphics in R
Day 3
08:30 – 09:30 GLMs (same slopes models) and dummy variables
09:30 – 10:30 GLMs (separate slopes models)
10:30 – 11:00 – Break –
11:00 – 12:00 Model simplification
12:00 – 13:00 – Lunch Break –
13:00 – 15:00 Practical: Multifactor models and GLM in R
15:00 – 15:30 – Break –
15:30 – 16:30 Sampling, experimental design
16:30 – 17:30 Practical: Sampling and power analysis in R
Day 4
08:30 – 09:30 Experimental design
09:30 – 10:30 Single factor designs
10:30 – 11:00 – Break –
11:00 – 12:00 Nested designs and variance component analysis
12:00 – 13:00 – Lunch Break –
13:00 – 15:00 Discussion: methodological and analysis problems with published papers
15:00 – 15:30 – Break –
15:30 – 17:30 Practical: Single factor designs in R
Day 5
08:30 – 09:30 Multifactor designs
09:30 – 10:30 Multifactor designs including time
10:30 – 11:00 – Break –
11:00 – 12:00 Discussion: Dealing with background variation
12:00 – 13:00 – Lunch Break –
13:00 – 15:00 Practical: Multifactor designs
15:00 – 15:30 – Break –
15:30 – 16:30 Intro to Generalised linear models
16:30 – 17:30 Poisson regression and Contingency tables
Day 6
08:30 – 09:30 Practical: Data collection for contingency tables
09:30 – 10:30 Practical: Analysis of contingency tables
10:30 – 11:00 – Break –
11:00 – 12:00 Binomial ANOVA and logistic regression
12:00 – 13:00 – Lunch Break –
13:00 – 15:00 Practical: Poisson and Binomial Generalised Linear Models in R
15:00 – 15:30 – Break –
15:30 – 17:30 Revision, questions etc.












