## Experimental Design & Data Analysis

On a regular basis, we teach two levels of course in Experimental Design & Data Analysis: Basic and Advanced. These are usually taught as six day courses immediately before the Association for Tropical Biology & Conservation (ATCB) Asia-Pacific chapter meetings (see http://tropicalbiology.org/). In addition for the ATBC meeting in Cairns, Australia (2014) (http://www.atbc2014.org/) we will be holding an advanced course in spatial modeling using the R-INLA.

**Basic course**

This is an intensive six-day course condensed from a former full-semester course taught at the University of Canterbury, NZ. It is based on the open source program R, which is introduced to course participants at the beginning of the course and used throughout.

**Module I: Scientific methodology and experimental design (~1.5 days)**

This module starts with a review of the basics of scientific methodology and statistical concepts, including the deductive reasoning and test statistics used in classic hypothesis testing. Examples of simple regression and ANOVA are worked by hand to facilitate through understanding of the fundamentals. Students then learn to how to do regression and ANOVA models in R.

Experimental design is introduced to the students in a structured manner, starting from the concepts of treatment, control, replication and randomization. Different types of single factor designs are then covered. Students then move on to nested design for testing multiple factors operating at hierarchical spatial or temporal scales. Variance components (fixed and random effects) are then introduced. Examples for every type of design are provided and worked out by the students in corresponding practicals.

**Module II: Parametric models and non-parametric models (~1 day)**

Assumptions of parametric models are reviewed, with the emphasis on what can cause each assumption to be broken, what happens if it is broken and how to check it with model checking tools in R. Suitable data transformations are then introduced to deal with “problematic” data in parametric models. Non-parametric rank tests are then introduced and their limitations discussed.

**Module III: General linear models, model fit and simplification (~1.5 days)**

At this stage the students will be relatively familiar with the simple regression and ANOVA models. They learn that both belong to a group of models called the general linear models and how to deal with both continuous and categorical predictors in the same model (ANCOVA). Their understanding of the underlying mathematics is reinforce through learning to interpret the model outputs and work out the model predictions by hand.

Assumptions of parametric and linear models are revisited at this point as the students learn how to assess the validity of model assumptions and model fit. They then learn what defines the most parsimonious model based on the Akaike’s Information Criterion (AIC). Students will be guided through the steps of model simplification step by step using a few examples and given adequate exercises during practicals.

**Module IV: Generalised linear models (~2 days)**

When the underlying error distribution of the data does meet the assumptions of linear models, is there a better way than to resort to data transformation or non-parametric models? This question is discussed in detail and the advantages of generalized linear models are introduced. The underlying mathematics of the generalized linear models are briefly covered. These include the concepts of different families of error distribution, linear predictor and link function. Examples of the common types of generalized linear models are then given with different families of error distribution. These include modeling for count data, binary data and proportion data, which are often encountered in ecological studies.

On the last day of the course, a few hours are set aside for the students to discuss their own data and how they think they can analyze these data using the modeling tools in R. This may be done in voluntary and relatively casual presentations. Interactions and discussions between the presenters and the audience are highly encouraged.

*Modus operandi*

The course is taught by two post-doctoral level instructors, experienced in ecological field studies and data analysis in R. One or two teaching assistants will be with the class to provide one-on-one help for the students, especially during the practicals. The pace of the lessons is carefully monitored and adequate Q&A time is provided so that after each day, most participants feel reasonably stretched but not overwhelmed.

The course centres on fitting hierarchical models in R, including linear mixed effects models and generalised linear mixed effects models.

**Day 1**

The first day of the course will cover generalised linear models with a focus on understanding the fitting procedures and their mathematical basis. This may serve as a recap for some students but will facilitate learning over the subsequent days.

**Day 2**

On the second day we will build on the material on day 1 to move on to linear mixed effects models. We will concentrate on understanding experimental designs that require the use of linear mixed effects models and fitting models to simple data sets.

**Day 3**

We will continue to explore mixed effects models with further examples. We will tackle more complex designs (e.g. crossed random effects). We will also use mixed effects models to make predictions and calculate confidence intervals.

**Day 4**

We will now move to generalised linear mixed effects models (GLMMs). We will cover fairly basic models on this first day and explore model checking and interpretation.

**Day 5**

We will continue to develop our understanding of GLMMs and consider making predictions from models, calculating standard errors and confidence intervals using various methods.

**Day 6**

On the final day, we will cover generalised least squares and models that account for residual dependence and heteroscedasity.

Integrated nested Laplace approximation (INLA) facilitates the fitting of a large range of complex statistical models, such as hierarchical models or spatial point process models by dramatically reducing computation time.

This workshop discusses how spatial models may be fitted with INLA using the package R-INLA. We discuss a wide range of different types of spatial models, in particular complex spatial models, spatial point process models and hierarchical models.

Teaching will be a combination of lectures, computer sessions and discussions. Participants are encouraged to discuss their own data sets with the instructors.

The programme (lectures and practicals) will include:

- a general introduction to integrated nested Laplace approximation (INLA)
- an overview of types of spatial models and examples
- examples of simple spatial models in R-INLA
- complex spatial models: joint models with several likelihoods, marked point patterns, models with multivariate spatial fields

Beguin J, Martino S, Rue H, Cumming SG (2012), Hierarchical analysis of spatially autocorrelated ecological Methods in Ecology and Evolution, DOI: 10.1111/j.2041-210X.2012.00211.x

Illian JB, Martino S, Sørbye S, Gallego-Fernandez, J, Travis J, (2013), Fitting complex ecological point processes with integrated nested Laplace approximation (INLA), Methods in Ecology and Evolution, DOI: 10.1111/2041-210x.12017

Illian JB, Sørbye, SH, Rue H 2012a. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA),

*Annals of Applied Statistics*, 6:1499-1530.

Illian, JB, Sørbye, SH, Rue, H and Hendrichsen, DK 2012b. Fitting a log Gaussian Cox process with temporally varying effects – a case study. Journal of Environmental Statistics, 3:1-25.

Rue, H, Martino, S and Chopin, N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated Laplace approximations (with discussion).

*JRSS B*71: 319-392.