# NRES 798 201501 Lab 7 # Objectives of today's labs # Develop a binomial GLM: plot and interpret the results # Develop a Poisson GLM: plot and interpret the results # Run a GAM and use the output to evaluate the functional relationship # Compare the GAM fitted model to GLM and LM model fits # Survival analysis # AIC model selection ######################################### # Generalized Linear Models (GLM): Binary ######################################### # Install package "labdsv" that included data to be used in GLM and GAM analysis install.packages("labdsv") library(labdsv) # Dave Roberts Bryce Data data(bryceveg) data(brycesite) attach(bryceveg) attach(brycesite) # This data set is from Bryce Canyon and includes two components: # "brycesite" contain plot information from 160 sites # "bryceveg" is a presence absence data set for a wide range of plant species # We will be focusing on the data for Berberis repens ("berrep"), a common shrub, and how its occurrence if influenced by elevation and aspect # TODO: Inspect the data for any inconsistencies or problems (focus on "berrep") # TODO: how should the inconsistencies be dealt with. # TODO: run a linear model on "berrep" with elevation as the independent variable # TODO: Inspect the residuals. Does this pattern make sense? What distribution defines the data? # TODO: run a glm on the berrep occurrence data using elevation as the predictor variable # Note: the general form for a glm is <- glm (y > 0 ~ x, family = binomial) # TODO: plot the fitted values from the model # e.g. plot(elev,fitted(model),xlab='Elevation', ylab='Probability of Occurrence') # TODO: evaluate the fitted glm model (summary(model) # TODO: What do the residual deviance and df values tell us? # TODO: run a glm model that includes both elevation "elev" and aspect "av", and evaluate if the fit is improved # TODO: use anova(model,test="Chi") to evaluate how each term in the model reduces deviance ######################################### # Generalized Linear Models (GLM): Poisson ######################################### # TODO: Read in the data set "beetle.txt" # TODO: Make sure "condit" and "disper" are recognized as factors (i.e. categorical variables, not continuous) # In the "beetle' data set the number of mates each female attracted was also recorded. # TODO: What distribution likely best represents the number of mates? # TODO: run linear first order models on beetle$ummates using first only the categorical variables, and then the categorical and continuous variables # TODO: evaluate the overall fits of these models # TODO: specify glm models that have the same structure as the liner model used above, using an appropriate linker function # TODO: evaluate the fit of the glm models compared to the linear models ######################################### # Generalized Additive Models (GAM) ######################################### # The occurrence of Berberis repens in the Bryce Canyon data set can also be analysed using a GAM # TODO: run a gam on "berrep" that just includes elevation as a smoothed predictor variable # Note: the general form for a gam is <- gam(y > 0 ~ s(x), family=binomial) # TODO: use summary(model) to examine the gam model fit # TODO: plot the fitted gam model to visualize the smoothing function # Note: plot(gam.model,se=TRUE) is the easiest way to do this. # TODO: how does the gam fitted function visually compare to the glm fitted function? # TODO: run a GAM that includes both elevation and aspect # TODO: compare the elevation and the elevation and aspect models using anova # Note: the structure is anova(mod1,mod2,test='Chi') ######################################### # Survival analysis ######################################### library(survival) # Read in the data set "seedlings.txt" # Read in the data set "Roaches.txt" # 60 Tree seedlings were planted and their survival time recorded # 30 trees were planted in September and 30 in October # When planted, the amount of light reaching the seedlings was recorded (gapsize) # TODO: survival input objects require the "status" of the individual to be identified at the end of the observation period # TODO: create a status column using "status <- 1*(seedlings$death>0)" # TODO: What is this doing, and how are live and dead individuals coded? # TODO: Use the function Surv() to create a survival object. # Note: the function takes "survival time" and "status" as inputs # What is the structure of the resulting Surv() object? Does this differ from a cbind() object? # TODO: Use the Kaplan-Meier estimation to fit a survival curve to the seedling data (i.e. use survfit(survival_object~1)) # TODO: Plot the estimated survival curve using plot(), and add appropriate axis labels. # TODO: What do the dashed lines represent? # hint: use summary() to examine the fitted object # TODO: Fit separate survival curves for the September and October cohort # TODO: fitted lines using different colours and line types # TODO: Fit a Cox proportional hazards regression model that only includes an intercept term # hint coxph(survival_object~1) # TODO: produce a survival curve for the coxph() model and plot the results # hint: survfit(coxph()) # TODO: does the survival curve differ from the previous one # TODO: Fit a Cox proportional hazards models that includes "gapsize" as a predictor variable # TODO: Use summary() to examine the fitted object # TODO: Does "gapsize" influence the survival curve? # TODO: use anova() to examine the fitted Cox object. How does this information differ from survival()? How should the probability values be interpreted? # TODO: Fit a Cox proportional hazards model that includes both "gapsize" and "cohort", but not interaction. # TODO: which factors significantly influence the survival curve # TODO: Fit a Cox model that includes "gapsize", "cohort" and the interaction term # note: use strata(seedlings$cohort) to identify the components of "cohort" # TODO: Does the inclusion of the interaction term improve the fit? # hint: use anova("model without interaction", "model with interaction") # TODO: Using the data set "roaches" # TODO: create a survival object using Surv() # TODO: Perform a parametric survival analysis using the function survfit() # TODO: Specify the model to use a "weibull" distribution # TODO: Specify the model to include "weight" and "group, and the interaction between them. # TODO: Which terms are important in the model? ######################################### # AIC model selection ######################################### # TODO: Read in the data set "beetle.txt" # TODO: create a lm for "eggmass" that includes the predictors "condit","disper","elytra" but no interaction terms # TODO: use AIC(model) to calculate the AIC value for this model # TODO: what does this value mean? # TODO: For the response variable "eggmass" construct all posible model combinations using "condit", "disper" and "elytra" # TODO: Calculate AIC values for all models. # TODO: Determine which model is best. # TODO: Determine which models can be discarded and which shoudl be kept for evaluation. # TODO: create a lm for "eggmass" that includes the predictors "condit","disper","elytra" and all 2-way interactions # Use the function step() to choose model structure using AIC in a stepwise selection # TODO: what does the function return? # TODO: what is the best model, and what models should be kept?