`install.packages("SMPracticals")`

# 3 R practicals

## 3.1 Getting started

For running the code below, you will need the R packages `MASS`

and `SMPracticals`

. The `MASS`

package (Venables & Ripley, 2002) is one of the recommended R package and is included with the binary distributions of R, so you should have it. The `SMPracticals`

package (Davison, 2024) is the R package providing the datasets and a few functions for use with the practicals outlined in Davison (2003, Appendix A), and can be installed by running

The packages can be loaded and attached by doing

```
library("MASS")
library("SMPracticals")
```

## 3.2 `trees`

data

`trees`

contains data on the volume of timber, height and girth (diameter) of 31 felled black cherry trees; girth is measured four feet six inches above ground (Atkinson, 1985, p. 63). The problem is to find a simple linear model for predicting volume from height and girth. See `?trees`

for more details.

```
data("trees", package = "datasets")
pairs(trees, panel = panel.smooth)
pairs(log(trees), panel = panel.smooth)
```

`coplot()`

generates conditioning plots, in which the relationship between two variables is displayed conditional on subsets of values of other variables. This is useful to see if the relationship is stable over the range of other variables. The plots should be read from left to right, starting from the bottom row, and each plot corresponds to the ranges of values (from left to right) shown on the top plot for the conditioning variable. For the relationship of log volume and log girth, conditional on height we get:

`coplot(log(Volume) ~ log(Girth) | Height, data = trees, panel = panel.smooth)`

Produce and interpret the conditioning plots on the orginal scale.

For an initial fit, we take a linear model and assess model fit using diagnostic plots:

```
<- glm(Volume ~ Girth + Height, data = trees)
m_trees summary(m_trees)
plot.glm.diag(m_trees)
```

What do you make of the `m_trees`

fit?

To assess the possibility of transformation:

`boxcox(m_trees)`

Both \(\lambda = 1\) and \(\lambda = 0\) lie outside the confidence interval, though the latter is better supported. One possibility is to take \(\lambda = 1/3\), corresponding to response \(\texttt{Volume}^{1/3}\).

What transformations for `Girth`

and `Height`

are then needed for dimensional compatibility? Fit this model, give interpretations of the parameter estimates, and discuss its suitability.

An alternative is to suppose that a tree is conical in shape, in which case

\[
\texttt{Volume} \propto \texttt{Height} \times \texttt{Girth}^2 \, .
\] Equivalently, we fit

```
<- glm(log(Volume) ~ log(Girth) + log(Height), data = trees)
m_trees_log summary(m_trees_log)
plot.glm.diag(m_trees_log)
```

Are the parameter estimates consistent with this model? Does it fit adequately? What advantage has it over the others for prediction of future volumes?

## 3.3 `salinity`

data

`salinity`

contains \(n = 28\) observations on the salinity of water in Pamlico Sound, North Carolina (Atkinson, 1985, p. 48; Ruppert & Carroll, 1980). The response `sal`

is the bi-weekly average of salinity. The other three columns contain values of the covariates, respectively a lagged value of salinity `lag`

, a trend indicator `trend`

, and the river discharge `dis`

.

Using the techniques from the analysis of the `tree`

data set as a guide, find a model suitable for prediction of salinity from the covariates. The data contain at least one outlier.

## 3.4 `shuttle`

data

`shuttle`

contains the data in Davison (2003, Table 1.3) on O-ring failures for the space shuttle (Dalal et al., 1989).

```
data("shuttle", package = "SMPracticals")
row.names(shuttle) <- NULL
```

To fit a binomial logistic regression model with covariate temperature, we do:

```
<- glm(cbind(r, m - r) ~ temperature, family = binomial(), data = shuttle)
m_shuttle anova(m_shuttle)
summary(m_shuttle)
```

Try fitting with and without both covariates. To assess model fit, try

`plot.glm.diag(m_shuttle)`

Do you find these diagnostics useful?

## 3.5 `bliss`

data

`bliss`

provides data on mortality of flour-beetles as a function of dose of a poison (Bliss, 1935). To plot the death rates and fit a logistic regression model, we do:

```
data("bliss", package = "SMPracticals")
<- glm(cbind(r, m - r) ~ log(dose), family = binomial(), data = bliss)
m_bliss_logit summary(m_bliss_logit)
with(bliss, {
plot(log(dose), r/m, ylim = c(0, 1), ylab = "Proportion dead")
points(log(dose), fitted(m_bliss_logit), pch = 3, col = 2)
})
```

Does the fit seem reasonble to you?

Try again with the probit and cloglog link functions.

For example, for the cloglog link function we have:

```
<- glm(cbind(r, m-r) ~ log(dose), family = binomial("cloglog"), data = bliss)
m_bliss_cloglog with(bliss, {
plot(log(dose), r/m, ylim = c(0, 1), ylab = "Proportion dead")
points(log(dose), fitted(m_bliss_logit), pch = 3, col = 2)
points(log(dose), fitted(m_bliss_cloglog), pch = 3, col = 3)
})
```

Which link function fits best? Give a careful interpretation of the resulting model.