It is one of those times that I am really excited and somewhat obsessed with the current research project I am working on. It is an old idea that after many years going around my head, it seems to be coming close to fruition. This is not the reason I am writing though.

Part of that research project involves heavy usage of the Bartlett relations (or Bartlett identities). The Bartlett relations (see Bartlett 1953, Section 2) are at the core of much of statistical theory and practice. They explain the way that the null expectations of products of log-likelihood derivatives are linked to each other.

For example, you might have heard or been taught that the variance of the maximum likelihood estimator (MLE) can be estimated using the inverse of the expected value of the negative second derivative of the log-likelihood at the maximum likelihood estimates. This is simply due to the, so-called, second Bartlett identity according to which the latter expectation is equal to the variance of the first derivative of the log-likelihood, whose inverse is, in turn, the variance to use based on standard results about the limiting distribution of the MLE.

Let’s introduce some notation. Suppose that \(M_\theta\) is a continuous family of distributions indexed by a parameter \(\theta\), and write \(log f(y; \theta)\) for the log-likelihood about \(\theta\). Then, assuming that we can interchange integration over the sample space and differentiation with respect to the parameter (and a few other assumptions, that are not of essence to report here), taking derivatives on both sides of \(\int f(y; \theta) dy = 1\) we obtain that \[ E_{M_\theta}\left(\frac{d}{d\theta}\log f(Y; \theta) \right) = 0 \, , \] where \(E_{M_\theta}\) is expectation with respect to \(M_\theta\). This is the first Bartlett identity. Taking derivatives once more gives \[ E_{M_\theta}\left(\frac{d^2}{d\theta^2}\log f(Y; \theta) \right) + E_{M_\theta}\left\{\left(\frac{d}{d\theta}\log f(Y; \theta) \right)^2\right\} = 0 \, . \] This is the second Bartlett identity and is exactly what justifies the use of the inverse of the expected value of the negative second derivative of the log-likelihood at the maximum likelihood estimates as an estimate of the variance of the MLE. We can keep differentiating to get higher-order Bartlett relations.

Denote \[ \nu_{rst\dots} = E_{M_\theta}\left(\frac{d^r}{d\theta^r}\log f(Y; \theta) \frac{d^s}{d\theta^s}\log f(Y; \theta) \frac{d^t}{d\theta^t}\log f(Y; \theta) \ldots \right) \quad r,s,t,\ldots \in \{ 1, 2, \ldots\}\, . \] Then, the first Bartlett identity is \(\nu_1 = 0\) and the second is \(\nu_{11} + \nu_{2} = 0\). You may be wondering now whether there is any general rule that allows computing higher order Bartlett relations quickly. The answer is yes! Skovgaard (1986) showed that the derivative of any expectation of products of log-likelihood derivatives is simply the sum of all moments that result by increment of each unique available index by 1, and then completion of the set of available indices with 1. For example, \[ \frac{d}{d\theta} \nu_{11} = \nu_{12} + \nu_{21} + \nu_{111} = 2\nu_{12} + \nu_{111} \, . \] We can now compute Bartlett relations of arbitrary order! But, the expressions become cumbersome quickly, and collecting terms is quickly a daunting exercise (remember I need to go to quite high orders for this project I am working on). There are books with those relations but I wanted something that I can reproduce on my laptop and also extend.

So, I wrote an R function (differentiate_moment) that implements the differentiation rule I just mentioned. It takes as input the vector of indices and applies Skovgaard’s theorem to return a list with the indices of each term in the sum that is the derivative. The following code chunk illustrates

differentiate_moment <- function(index = 1) {
    n_terms <- length(index) + 1
    terms <- as.list(rep(NA, n_terms))
    for (j in seq.int(n_terms - 1)) {
        current_index <- index
        current_index[j] <- index[j] + 1
        terms[[j]] <- current_index
    }
    terms[[n_terms]] <- c(index, 1)
    lapply(terms, function(x) sort(x))
}

(der <- differentiate_moment(c(1,1)))
## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] 1 1 1

I actually wrote a function so that I can get the derivatives ready for input in a LaTeX-capable document.

prettify <- function(terms, equals = NA) {
    terms <- lapply(terms, paste, collapse = "")
    tab <- table(unlist(terms))
    freq <- as.character(unname(tab))
    freq[freq == 1] <- ""
    str <- paste(paste0(freq, "\\nu_{", names(tab), "}"), collapse = " + ")
    if (is.na(equals)) {
        paste("$$", str, "$$", sep = "")
    }
    else {
        paste("$$", paste(str, "=", equals), "$$", sep = "")
    }
}
cat(prettify(der))

\[\nu_{111} + 2\nu_{12}\]

Now that we have the means to differentiate null moments, the computation of the Bartlett relations can happen in an elegant way using a recursion structure! The code chunk below implements this recursion.

bartlett_identity <- function(order = 1, index_list = list(1)) {
    if (order == 1) {
        return(index_list)
    }
    out <- lapply(index_list, differentiate_moment)
    if (order > 1) {
        order <- order - 1
        bartlett_identity(unlist(out, recursive = FALSE), order = order)
    }
    else {
        unlist(out, recursive = FALSE)
    }
}

With bartlett_identiy we can now compute all Bartlett relations up to a pre-specified order. The chunk of code below does it up to order 6

for (o in seq.int(6))
    cat(prettify(bartlett_identity(order = o), equals = 0), "\n")

\[\nu_{1} = 0\] \[\nu_{11} + \nu_{2} = 0\] \[\nu_{111} + 3\nu_{12} + \nu_{3} = 0\] \[\nu_{1111} + 6\nu_{112} + 4\nu_{13} + 3\nu_{22} + \nu_{4} = 0\] \[\nu_{11111} + 10\nu_{1112} + 10\nu_{113} + 15\nu_{122} + 5\nu_{14} + 10\nu_{23} + \nu_{5} = 0\] \[\nu_{111111} + 15\nu_{11112} + 20\nu_{1113} + 45\nu_{1122} + 15\nu_{114} + 60\nu_{123} + 6\nu_{15} + 15\nu_{222} + 15\nu_{24} + 10\nu_{33} + \nu_{6} = 0\]

I am sure there are faster and more advanced ways for deriving a Bartlett relation of given order (see, for example, the excellent book by McCullagh 1987), but the above was just neat to develop and… fun! The code above can also be further optimised by simply grouping the terms that are the same at the level of differentiate_moment.

References

Bartlett, M. S. 1953. “Approximate Confidence intervals.II. More Than One Unknown Parameter.” Biometrika 40 (3/4): 306–17. http://www.jstor.org/stable/2333349.

McCullagh, P. 1987. Tensor Methods in Statistics. London: Chapman; Hall.

Skovgaard, Ib. 1986. “A Note on the Differentiation of Cumulants of Log Likelihood Derivatives.” International Statistical Review 54 (1): 29–32. http://www.jstor.org/stable/1403256.