1 Introduction

Bivariate extreme models are a powerful statistical tools used to analyse extreme values of two related variables simultaneously. These models are particularly valuable in environmental sciences, finance, and risk management, where understanding the joint behaviour of extreme events is essential. By modeling the dependence structure, bivariate extreme models allows us to assess the joint tail behaviour, quantify risk, and make informed decisions.

2 Why bivariate extremes?

The statistical analysis of bivariate extreme helps us understand the joint behaviour of extreme occurrences, which univariate analysis cannot capture, leading to more accurate risk assessments. To showcase this, Figure 1 shows the closing daily pricing of two leading European stock markets, CAC 40 and DAX 30, from January 1, 1988, to January 1, 2014. We can see there is a clear relationship between the behaviour of these stock markets over the years:

**Figure 1**. Closing daily prices for CAC 40 and DAX 30 for the period 1988 to 2014.

Figure 1. Closing daily prices for CAC 40 and DAX 30 for the period 1988 to 2014.

Extreme losses can be characterised through negative log returns. Figure 2 show time series plots of negative log-retuns, which are exhibit strong heteroskedasticity:
**Figure 2.** Negative log returns for CAC 40 and DAX 40.

Figure 2. Negative log returns for CAC 40 and DAX 40.

Figure 3 shows pairs of (filtered) negative log-return, where we can clearly see a strong dependence of very high value (extreme losses). I used a time-varying colour palette, to highlight how the relationship between the pair changes with time. From a risk-managing perspective, we would be interested in studying simultaneous losses of these two indices.
**Figure 3.** Scatterplots using a time-varying colour palette for GARCH-filtered CAC 40 and DAX 30residuals. Extreme losses are in the top-right.

Figure 3. Scatterplots using a time-varying colour palette for GARCH-filtered CAC 40 and DAX 30residuals. Extreme losses are in the top-right.

Of course, bivariate extremes can be of interest not only in finance but also in environmental, engineering, medical, and in general, any application where rare but high-impact events are crucial. Figure 4 shows a scatterplot of wind gusts (km/h) in two monitoring stations in Valkenburg and Schiphol. Bivariate extremes can help us understand the relationship between extreme wind gusts in two different locations.
**Figure 4.** Another example of the importance of extreme values. The figure shows a scatterplot of wind gusts (right) at two locations in The Netherlands (right).

Figure 4. Another example of the importance of extreme values. The figure shows a scatterplot of wind gusts (right) at two locations in The Netherlands (right).

We could also be interested in the study of extremes of two environmental variables at a single spatial location, as it is shown in Figure 5.
**Figure 5.** Daily oncentrations of CO (Carbon Monoxide) and NOX (Nitrogen Oxides) in a monitoring station in London during 2020.

Figure 5. Daily oncentrations of CO (Carbon Monoxide) and NOX (Nitrogen Oxides) in a monitoring station in London during 2020.

For all the problems introduced above, we need to characterise the distribution of two random variables in the joint (upper) tail region. By definition, there is not much data in the joint tail, so we cannot use empirical representations of the distribution of joint extremes. Bivariate (and more generally, multivariate) extreme value theory (EVT) provides strong mathematical arguments for calculating joint tail probabilities. The main focus of multivariate EVT is in the dependence structure; as we can see from the examples above, if extremes occur together, the overall risk increases.

3 Lecture structure

This lecture focuses on componentwise maxima, i.e., the random vector constructed from taking the maximum of each component of a bivariate vector. Specifically, in Section 4, we will study the limiting behaviour of the componentwise maxima vector. In Section 5, we will learn measures of extremal dependence that are very useful for statistical modelling. Section 6 will use the limiting representation in Section 4 to construct parametric and semi-parametric models for bivariate componentwise maxima. In Section 7, I will talk about regression for bivariate extremes and conclude in Section 8 with comments on the limitations of the methods, alternative representations for bivariate/multivariate extremes and R packages for analysis of extremes.

4 Componentwise maxima

4.1 Asymptotic characterisation

Let \((X_1,Y_1),(X_2,Y_2),\ldots\) be a sequence of independent and identically distributed continuous random vectors with distribution \(F_{XY}(x,y)\). Let \(F_X(x)\) and \(F_Y(y)\) be the marginal distributions of \(X\) and \(Y\), respectively. We define the vector of componentwise maxima as \[\boldsymbol{M}_n = (M_{x,n},M_{y,n}) = \left(\max_{i=1,\ldots,n}{X_i},\max_{i=1,\ldots,n}{Y_i}\right).\] The index \(j\) for which the maximum of the \(X_i\) sequence occurs is not necessarily the same index \(k\) for which the maximum of the \(Y_i\) sequence occurs. In other words, \(\boldsymbol{M}_n\) might not correspond to an actual observation. To see this, the following R code generates \(N\times n = 20\times100=2,000\) samples from \(X\) and \(Y\) (\(X\perp Y\)) following Gaussian distributions and construct the componentwise maxima vector using blocks of size \(n=100\). We can see that only one componentwise maxima coincide with the bivariate observations.

**Figure 6.**Simulation illustrating that the vector of componentwise maxima might not correspond to an actual observation.

Figure 6.Simulation illustrating that the vector of componentwise maxima might not correspond to an actual observation.

As in the univariate case, the asymptotic theory of bivariate extremes starts by studying the limiting behaviour of \(\boldsymbol{M}_n\). Since \(X\) and \(Y\) are independent, we know that the three-types theorem applies to \(M_{x,n}\) and \(M_{y,n}\) (if the necessary normalising sequences exist such that the limiting distribution of renormalised maxima is non-degenerate). When characterising the limiting behaviour of random vectors, it is standard practice to assume some common marginal distribution for each component. By means of the probability integral transform theorem\(^{\boldsymbol{(1)}}\), we can always assume virtually any distribution we want, and here we assume that both \(X\) and \(Y\) follow a standard Fréchet distribution, i.e., \[F_X(z) = F_Y(z) = F_\text{Fréchet}(z)= \exp(-1/z),\quad z>0.\] In that case, it is possible to show that the random variables \[\begin{equation}\label{eq:cw_smaxima} M_{x,n}^\star = \frac{\max_{i=1,\ldots,n}{X_i}}{n},\quad M_{y,n}^\star = \frac{\max_{i=1,\ldots,n}{Y_i}}{n} \end{equation}\] also follow a standard Fréchet distribution.

Optional material

Read the following if you want to know more about the use of the probability integral transform to obtain standard Fréchet random variables. If not interested, skip to the next section.

\(^{\boldsymbol{(1)}}\): The probability integral transform states that if \(Z\) has continuous distribution \(F_Z\), then \(U = F_Z(Z)\) follows a uniform distribution. So we can transform \(Z\) to a random variable \(W\) with the desired standard Fréchet distribution by defining \(W = F_\text{Fréchet}^{-1}(U)\). Indeed, note that \[F_{W}(z) = \text{Pr}(W \leq z) = \text{Pr}(F_\text{Fréchet}^{-1}(U) \leq z) = \text{Pr}(U \leq F_\text{Fréchet}(z)) = F_\text{Fréchet}(z),\] in other words, the cdf of \(W\) is \(F_\text{Fréchet}\) (the Fréchet distribution). In real-data applications, we don’t usually know \(F_Z\), so a common practice is to use its empirical distribution (ecdf). The following example illustrates how we can use the above results:

G = function(x){
  stopifnot(x>0)
  exp(-1/x)
}
Ginv = function(p){
  if(p>0 & p<1)
    return(-1/log(p))
  else
    return(NA)
}
z = rnorm(100, sd = 10)
U = ecdf(z)(z) # we assume we don't know that z is normally distributed
W = sapply(U, Ginv)

# Does W follows a std Frechet distribution? H0: W ~ stdF
library(evd)
ks.test(W, pfrechet)

4.1.1 The bivariate extreme value distribution

Let \(\boldsymbol{M}_n^\star = (M_{x,n}^\star, M_{y,n}^\star)\) with \(M_{x,n}^\star\) and \(M_{y,n}^\star\) defined as in \(\eqref{eq:cw_smaxima}\). Then if \[\text{Pr}(M_{x,n}^\star\leq x, M_{y,n}^\star\leq y)\overset{\mathcal{d}}\rightarrow G(x,y),\] where \(G\) is a non-degenerate distribution, then \(G\) is called the bivariate extreme value distribution and has the form \[G_H(x,y) = \exp\{-V_H(x,y)\},\quad x,y> 0,\] where \[V_H(x,y) = 2\int_0^1\max\left(\frac{w}{x},\frac{1-w}{y}\right)\text{d}H(w),\] is called exponent measure and \(H\) is the spectral distribution function defined in \([0,1]\) satisfying the mean constraint \[\begin{equation}\label{eq:meanc} \int_0^1w\text{d}H(w) = \frac{1}{2}. \end{equation}\]

Remarks

  1. Note that the subscript in \(G_H\) indicates that there is a family of limiting distributions (indexed by \(H\)). Since \(H\) is a distribution, the above means that there are infinite possible limits. This is very different from the univariate case where the family of GEV distributions can be characterised by a finite number of parameters.

  2. Assuming standard Fréchet margins is a choice, meaning that other margins can be assumed. In general, we can apply increasing affine transformations to each margin and consider the sequence of random vectors \[\left(\frac{M_{x,n}-b_{x,n}}{a_{x,n}},\frac{M_{y,n}-b_{y,n}}{a_{y,n}}\right).\] As in the univariate case, the specific form of the normalising constant \(a_{x,n}, b_{x,n},a_{y,n}\) and \(b_{y,n}\) will depend on the distribution of the original variables \(X_i, Y_i\). The particular choice is not so important from an asymptotic point of view, but some useful properties can be more easily studied from a specific selection. Popular choices include exponential, Gumbel, uniform, and Weibull distributions. Examples of other margins employed in the literature can be found in Beirlant et al. (2004).

  3. If \(H\) is differentiable, the spectral density is denoted by \(h(w) = \frac{\text{d}H(w)}{\text{d} w}\).

All the information about extremal dependence is contained in the spectral distribution \(H\) (or equivalently in the measure \(V\)). Another equivalent way to characterise it is through the so-called Pickands’ dependence function \(A\) defined as \[\begin{equation}\label{eq:pickands} A_H(w) = V_H({w^{-1},(1-w)^{-1}}),\quad w\in[0,1]. \end{equation}\] We can see from \(\eqref{eq:pickands}\) that there is a one-to-one mapping between \(A\), \(V\) and \(H\). Indeed, we can express \(V\) as \[V_H(x,y) = \left(\frac{1}{x}+\frac{1}{y}\right)A_H\left(\frac{y}{x+y}\right),\quad x,y>0.\] Additionally, \(V\) and \(A\) have the following properties:

  1. \(V_H(\infty,z) = V_H(z,\infty) = 1/z\), \(z>0\).
  2. Homogeneity of order -1: \(V_H(tx,ty) = t^{-1}V_H(x,y)\), \(x,y,t>0\)
  3. \(A_H\) is a bounded function: \(\max(w,1-w)\leq A_H(w)\leq 1\)
  4. \(A_H\{tw_1 + (1-t)w_2\}\leq tA_H(w_1) + (1-t)A_H(w_2),\quad\), \(t\in[0,1]\)

Exercise 1. Show the max-stability property: \({G_H(tx, ty)}^t = G_H(x,y)\), for \(x,y,t>0\).

Solution. \[{G_H(tx, ty)}^t = \exp\{-tV_H(tx,ty)\} = \exp\{-V_H(x,y)\} = G_H(x,y)\] The second equality is due to the homogeneity of order -1 propoperty of \(V_H\).

5 Coefficients of extremal dependence

The main goal of bivariate extremes is to characterise the extremal dependence between variables. Before we further describe extremal dependence, let’s study the two boundary cases of dependence: independence and perfect dependence.

  • Case of independence. This case arises when \(H\) places equal mass at the boundary, i.e., \(H(0) = H(1) = 0.5\). In that case \[V_H(x,y) = 2\left\{0.5\times\max\left(\frac{0}{x},\frac{1}{y}\right) + 0.5\times\max\left(\frac{1}{x},\frac{0}{y}\right) \right\} = \frac{1}{y} + \frac{1}{x}\] The corresponding bivariate extreme value distribution \(G\) and dependence function \(A\) are \[\begin{align*} G(x,y) &= \exp\{-(x^{-1} + y^{-1})\} = \exp\{-x^{-1}\}\times \exp\{-y^{-1}\} = F_X(x)\times F_Y(y) = F_\text{Fréchet}(x)\times F_\text{Fréchet}(y).\\ &\\ A(w) &= 1,\quad\forall w\in[0,1]. \end{align*}\]
**Figure 7.** Spectral distribution and Pikands' dependence function for the case of independence.

Figure 7. Spectral distribution and Pikands’ dependence function for the case of independence.

  • Exercise 2. Case of perfect dependence. This case arises when \(H\) places unit mass on \(w=0.5\). Obtain expressions for \(V_H, G_H\) and \(A_H\).

Solution. \[\begin{align*} V_H(x,y) &= 2 \max\left(\frac{0.5}{x},\frac{0.5}{y}\right) = \max\left({x}^{-1},{y}^{-1}\right)// G_H(x,y) &= \exp\{-\max\left({x}^{-1},{y}^{-1}\right)\}\\ A_H(w) &= V_H(w^{-1},(1-w)^{-1}) = \max\left(w,1-w\right) \end{align*}\]

Between these two boundary cases, the extremes of \(X\) and \(Y\) can exhibit different types of asymptotic (in)dependence. In the following section, we provide easily interpretable coefficients to characterise the level of extremal dependence of pairs of variables.

5.1 The \(\chi\) coefficient

One way to measure the strength of extremal dependence is through the limiting conditional probability \[\chi = \lim_{u\to 1}\text{Pr}\{F_X(X)>u\mid F_Y(Y)>u\}.\] Intuitively, \(\chi\) is the probability that \(X\) is extreme given that \(Y\) is extreme. Clearly, \(0\leq\chi\leq1\) and we have the following cases:

  • If \(\chi = 0\), the variables \(X\) and \(Y\) are said to be asymptotically independent (AI).

  • If \(\chi > 0\), the variables \(X\) and \(Y\) are said to be asymptotically dependent (AD). Moreover, the value of \(\chi\) increases with the strength of dependence at extreme levels.

If \(\chi > 0\) extremes can occur together in the limit, and if \(\chi = 0\) they cannot. But \(X\) and \(Y\) can still be dependent at any finite levels (even very high ones!).

In practical applications, we approximate \(\chi\). Let \[\chi(u) = 2 - \frac{\log\text{Pr}\{F_X(X)\leq u, F_Y(Y)\leq u\}}{\log u}.\] It can be shown that \(\chi = \lim_{u\to1}\chi(u)\) and that \(2-\log(2u-1)/\log u\leq \chi(u)\leq 1\). Note that \(\chi(u)\) can be negative, and its sign determines whether \(X\) and \(Y\) are negatively or positively associated at the level \(u\).

5.2 The \(\bar\chi\) coefficient

When \(X\) and \(Y\) are asymptotically independent, \(\chi\) and \(\chi(u)\) fail to provide any measure of the level of extremal dependence. So for AI variables, we have the quantity \[\bar\chi(u) = \frac{2\log(1-u)}{\log\text{Pr}\{F_X(X)>u,F_Y(Y)>u\}}-1.\] The limit of \(\bar\chi(u)\) when \(u\to1\) is denoted \(\bar\chi\) and we have that \(-1\leq\bar\chi\leq1.\) For AI variables, \(\bar\chi\) increases with strength of dependence at extreme levels.

5.3 Comments on asymptotic dependence and independence

  1. The pair of coefficients \((\chi,\bar\chi)\) and their finite-sample counterparts, \((\chi(u),\bar\chi(u))\), are very useful to determine the type of asymptotic dependence:

    • \(\chi>0\) and \(\bar\chi=1\): AD \(\Rightarrow\) \(\chi\) reflects strength of extremal dependence.

    • \(\chi=0\) and \(\bar\chi<1\): AI \(\Rightarrow\) \(\bar\chi\) reflects strength of extremal dependence.

  2. For statistical modelling, the type of asymptotic dependence assumed has a severe impact on extrapolation into the joint upper tail:

    • If AI is assumed, but the data are in fact AD, the joint occurrence probability of extremes will likely be underestimated.

    • If AD is assumed, but data are in fact AI, the joint occurrence probability of extremes will likely be overestimated.

    • In terms of risk management, assuming AD when data are AI might not be as bad as assuming AI when data are AD. In general, AD can be interpreted as a fairly conservative approach, which provides upper bounds on the joint occurrence probabilities of extremes.

  3. Some known cases:

    • Bivariate Gaussian vectors with correlation \(\rho<1\) always have \(\chi=0\), i.e., they are AI.
    • On the contrary, bivariate Student t vectors with finite degrees of freedom are AD.
    • Bivariate max-stable distributions (a.k.a., the bivariate GEVD family) are developed under the framework of bivariate regular variation and are, therefore, AD as well.

Exercise 3. The following figures show plots of \(\chi(u)\) and \(\bar\chi(u)\) for bivariate Gaussian, Student t and max-stable vectors. Can you recognise to what curve each model corresponds?

**Figure 8.** Example of dependence coefficients for different models.

Figure 8. Example of dependence coefficients for different models.

Gaussian is red, student t is blue and max-stable is black.

Exercise 4. We can use the function chiplot in the library evd to estimate \(\chi(u)\) and \(\bar\chi(u)\) non-parametrically. Check the implementation by generating \(N=2000\) copies of bivariate Gaussian vectors with correlation 0.6. Repeat the above now generating copies of zero-mean bivariate Student t vectors with 3 degrees of freedom and diagonal scale matrix.

N = 2000
par(pty = 's', mfrow = c(2,3), mar = c(3,3,3,1))
set.seed(191023)
xy.gauss = rmvnorm(N, mean = c(1,1.5), sigma = matrix(c(1,0.6,0.6,2), ncol=2))
plot(xy.gauss, pch = 16, xlab = 'x', ylab = 'y', main = 'Bivariate Gaussian')
chiplot(xy.gauss, which = 1)
chiplot(xy.gauss, which = 2)

xy.t = rmvt(N, sigma = diag(2), df = 3)
plot(xy.t, pch = 16, xlab = 'x', ylab = 'y', main = 'Bivariate Student t')
chiplot(xy.t, which = 1)
chiplot(xy.t, which = 2)

6 Bivariate extremes in practice

We saw in Section 5 that the bivariate extreme value distribution \(G_H\) arises as the unique limiting family of distributions for the properly renormalised vector of componentwise maxima. But since there is a one-to-one map between the spectral distribution \(H\) and \(G_H\), there are infinite possible forms for \(G_H\). Any distribution function on \([0,1]\) satisfying the mean constraint \(\eqref{eq:meanc}\) gives rise to a valid limit.

Exercise 5. Can you think of an example of a distribution function defined on \([0,1]\) that has mean 1/2?

An example is the Beta distribution where both shape parameters are equal.

So how can we use this very powerful and elegant mathematical result for statistical modelling? Well, in general, there are three possible strategies:

  1. We restrict ourselves to parametric representations of \(H\) (and therefore, of \(G_H, V_H, A_H\)).
  2. Use semi-parametric techniques
  3. Use non-parametric techniques the estimate the dependence structure

One of the main challenges with non-parametric and semi-parametric techniques is the need to restrict the estimators to satisfy the mean constraint in \(\eqref{eq:meanc}\). Non-parametric techniques were widely explored in the early and mid 2000s, remarkably using the copula approach. The interested reader can refer to Abdous and Ghoudi (2005), Schmidt and Stadtmüller (2006), Genest and Segers (2009) and references therein. In Section 6.1 we review different parametric models proposed in the literature. Sections 6.2-6.8 are devoted to statistical modelling assuming parametric forms. Section 6.9 summarises semi-parametric approaches.

6.1 Parametric bivariate models

The goal here is to use parametric sub-families of distributions for either \(H, V\) or \(A\). This means that \(G_H\) can be now described through a finite set of parameters, which greatly simplifies inference, at the cost of reducing model flexibility. In the following, provide a (non-exhastive) list of parametric models that have been proposed thoughout the years. First, a bit of notation is needed. Let \(z_1\) and \(z_2\) be general terms of the vector of componentwise maxima and define \[ \tilde{z}_i= \{1+\xi_i(z_i - \mu_i)/\sigma_1\}^{-1/\xi_i},\quad i=1,2,\] where \((\mu_i,\sigma_i,\xi_i)\) are the GEV parameters for \(z_i\), \(i=1,2\). In each of the bivariate distributions functions below, the univariate margins are GEV, so \(G(z_i) = \exp\{-\tilde{z}_i\}\), for \(i=1,2\).

  1. Logistic model. One of the simplest parametric bivariate models with only one dependence parameter: \[G_H(z_1,z_2) = \exp\left\{-\left(\tilde{z}_1^{1/\alpha} + \tilde{z}_2^{1/\alpha}\right)^{\alpha}\right\}, \quad \alpha\in(0,1]\]

  2. Asymmetric logistic model. An extension of the previous model with two additional parameters that allow for asymmetry in the dependence structure: \[G_H(z_1,z_2) = \exp\left\{(1-\psi_1)\tilde{z}_1 + (1-\psi_2)\tilde{z}_2 - \left[\left(\tilde{z}_1{\psi_1}\right)^{1/\alpha} + \left(\tilde{z}_2{\psi_2}\right)^{1/\alpha}\right]^\alpha\right\},\quad \alpha\in(0,1], \psi_1,\psi_2\in [0,1]\]

  3. Hüsler-Reiss model. For \(a>0\), \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1\Phi\left\{\frac{1}{a}+\frac{a}{2}\log\left(\frac{\tilde{z}_1}{\tilde{z}_2}\right)\right\} - \tilde{z}_2\Phi\left\{\frac{1}{a}+\frac{a}{2}\log\left(\frac{\tilde{z}_2}{\tilde{z}_1}\right)\right\}\right\}\]

  4. Bilogistic model. Another extension to the logistic model: \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1q^{1-\alpha} - \tilde{z}_2(1-q)^{1-\beta}\right\},\quad\alpha,\beta>0,\] where \(q\) is the solution of \((1-\alpha)\tilde{z}_1(1-q)^{\beta} - (1-\beta)\tilde{z}_2q^\alpha = 0.\) If \(\alpha = \beta\), the Biligistic model reduces to the Logistic one.

  5. Negative bilogistic model. The negative bilogistic distribution function with parameters \(\alpha>0\) and \(\beta>0\) is \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1 - \tilde{z}_2 + \tilde{z}_1q^{1+\alpha} + \tilde{z}_2(1-q)^{1+\beta}\right\},\] where \(q\) is the solution to the equation \((1+\alpha)\tilde{z}_1q^\alpha - (1-\beta)\tilde{z}_2(1-q)^\beta = 0\)

  6. Coles-Tawn model. The Coles-Tawn distribution function with parameters \(\alpha>0\) and \(\beta>0\) is \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1[1-\text{Be}(q;\alpha+1,\beta)] - \tilde{z}_2\text{Be}(q;\alpha,\beta+1)\right\},\] where \(q = \alpha\tilde{z}_2/(\alpha \tilde{z}_2 + \beta\tilde{z}_1)\) and \(\text{Be}(q;\alpha,\beta)\) is the beta distribution function evaluated at \(q\) with shape parameters \(\alpha\) and \(\beta\).

Exercise 6.

Using the function abvevd from the package evd, plot the Pickands’ dependence function for the following models and parameters configuration:

  • Logistic: \(\alpha=0.3, 0.6, 0.9\)

  • Asymmetric logistic: \(\psi_1=0.2,\psi_2=0.8, \alpha = 0.4\); \(\psi_1=\psi_2=0.5, \alpha = 0.4\); and \(\psi_1=0.8,\psi_2=0.2, \alpha = 0.4\)

  • Husler-Reiss: \(a=1,2,3\)

    Use a single plot for each model. You should end up with three plots, each containing three different lines. Note that abvevd adds two reference lines indicating the lower and upper bounds of the Pickands’ dependence function.

Solution.

par(pty = 's', mfrow = c(1,3), mar = c(3,3,3,1))
# Logistic
abvevd(dep = 0.3, model = "log", plot = TRUE, main = 'Logistic', lwd = 2, ylim = c(0,1))
abvevd(dep = 0.6, model = "log", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 0.9, model = "log", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom', 
       c(expression(paste(alpha, " = ", 0.3)),
         expression(paste(alpha, " = ", 0.6)),
         expression(paste(alpha, " = ", 0.9))),
       col = 1:3, lwd = rep(2,3))

# Asymmetric logistic
abvevd(dep = 0.4, asy = c(0.2, 0.8), model = "alog", plot = TRUE, main = 'Asymmetric logistic', lwd = 2, ylim = c(0,1))
abvevd(dep = 0.4, asy = c(0.5, 0.5), model = "alog", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 0.4, asy = c(0.8, 0.2), model = "alog", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom', 
       c(expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.2, " ", psi[2], "=", 0.8)),
         expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.5, " ", psi[2], "=", 0.5)),
         expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.8, " ", psi[2], "=", 0.2))),
       col = 1:3, lwd = rep(2,3))

# Husler-Reiss
abvevd(dep = 1, model = "hr", plot = TRUE, main = 'Husler-Reiss', lwd = 2, ylim = c(0,1))
abvevd(dep = 2, model = "hr", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 3, model = "hr", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom', c('a = 1', 'a = 2', 'a = 3'), col = 1:3, lwd = rep(2,3))

6.2 Maximum likelihood inference for parametric \(G_H\)

It can be shown that the density associated to the bivariate extreme value distribution \(G_H\) is \[g_H(z_1,z_2) = \frac{\partial^2 G_H(z_1,z_2)}{\partial z_1\partial z_2} = \{V_{1}(z_1,z_2)V_2(z_1,z_2) - V_{12}(z_1,z_2)\}\exp\{-V(z_1,z_2)\},\quad z_1,z_2>0,\] where the subscripts in \(V_1, V_2\) and \(V_{12}\) denote partial differentiation of \(V_H\) with respect to \(z_1\) and/or \(z_2\). If \((\boldsymbol{z}_{1},\boldsymbol{z}_{2}) = ((z_{1,1},z_{2,1}),\ldots,(z_{1,n},z_{2,n}))\) is a sample of \(G\) of size \(n\), the log-likelihood is given by \[\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2}) = \sum_{i=1}^n\log\{V_{1}(z_{1,i},z_{2,i})V_2(z_{1,i},z_{2,i}) - V_{12}(z_{1,i},z_{2,i})\} - V(z_{1,i},z_{2,i}),\] where \(\theta\) is the vector that parametrises \(G_H\) (for instance, if \(G_H\) is assumed to be the bivariate Logistic model, then \(\theta = \alpha\)). Solutions to the likelihood equations \(\frac{\partial\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2})}{\partial\theta}\mid_{\theta=\hat\theta} = 0\) do not usually have closed form, so the maximum likelihood estimator is obtained by numerically maximising \(\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2})\).\(^{\boldsymbol{(2)}}\)

\(^{\boldsymbol{(2)}}\): Since we are talking about the likelihood equations, we are implicitly assuming that they are well defined, i.e., the parametric models fo \(G\) are well-behaved. In particular, the support of the distribution does not depend on the parameter vector \(\theta\).

6.3 Return levels and return periods

A return level for a bivariate extreme value distribution can be constructed as follows. We know that any bivariate extreme value distribution can be represented thorugh the Pickands’ dependence function as \[G(x,y) = \exp\left\{-\left(\frac{1}{\tilde{x}} + \frac{1}{\tilde{y}}\right)A(w)\right\},\] where \(\tilde{x}\) and \(\tilde{y}\) correspond, respectively, to \(x\) and \(x\) transformed to be unit Frechet distributed, and \(w = \frac{y}{(x+y)}\in[0,1].\) Then, for a fixed probability \(p\) and \(w\), the equation \(G(x,y) = p\) leads to \[\tilde{x} = -\frac{A(w)}{w\log p} \text{ and } \tilde{y} = \frac{w\tilde{x}}{1-w}.\]

We can then easily write \(\tilde{x}, \tilde{y}\) in their original \(x,y\) scale.

More generally, we can define the \(p\)-probability return curve as the set \[\text{RC}(p) = \{(x,y)\in\mathbb{R}^2:\text{Pr}(X>x,Y>x) = p\},\] where \(\text{Pr}(X>x,Y>x)\) is called the joint survival function (see, e.g., Murphy-Barltrop and Wadsworth, 2023). We are of course interested in values of \(p\) close to 0, which correspond to rare joint exceedance events. Similarly, we can define the return period to be \(1/p\) since given any point in RC(\(p\)), we would expect to observe the event \(\{X>x, Y>y\}\) once, on average, each return period. Note that since return curves define a line in \(\mathbb{R}^2\), the two dimensional return level plot cannot be naturally extended in this setting. But one way to visualise return curves proposed by Murphy-Barltrop and Wadsworth (2023) is by considering different return periods and plotting the corresponding curves individually (as in the left plot in Figure 9) or simultaneously (as in the right plot in Figure 9). Figure 9 shows, on the left, a return curve for 1000 bivariate standard normal observations with correlation 0.5 for return period of 100 (\(p = 1/100\)). For a sample of size \(n = 1000\), we expect to observe \(np = 10\) points in the blue shaded region. On the right, Figure 9 shows return curves of the same bivariate normal data set for return periods in the set {10,100, 1000, 10,000} In their article, Murphy-Barltrop and Wadsworth (2023) proposed a novel way to estimate extremal bivariate return curves using methods based on Heffernan and Tawn (2004) and Wadsworth and Tawn (2013).

**Figure 9.** *Left*: Return curve for 1000 bivariate standard normal observations with correlation 0.5 for return period of 100. *Right*: Return curves of the same bivariate normal data set for return periods in the set {10,100, 1000, 10,000}. Taken from Murphy-Barltrop and Wadsworth (2023).

Figure 9. Left: Return curve for 1000 bivariate standard normal observations with correlation 0.5 for return period of 100. Right: Return curves of the same bivariate normal data set for return periods in the set {10,100, 1000, 10,000}. Taken from Murphy-Barltrop and Wadsworth (2023).

6.4 Diagnostics tools

Most graphical diagnostic tools used in the univariate case can be extended to the bivariate case. A non-comprehensive list of diagnostic plots is given below.

  • Conditional PP plots (probability-probability plots), conditioning on each margin.

  • Bivariate density plot.

  • Plot of Pickands’ dependence function (\(A(w)\)).

  • Quantile curves plot.

  • Plot of spectral density \(h(w)\)

Additionally, a plot of the empirical and model-based measures \(\chi(u)\) and \(\bar\chi(u)\) is helpful to check how well the model captures extremal dependence. You should include model-based confidence/credible intervals to the \(\chi(u)\) and \(\bar\chi(u)\) plots whenever possible. Finally, a good diagnostic approach should include checks for marginal fits (marginal qqplots, PP plots, density plots, etc).

6.5 Inference for parametric \(G_H\)

The following can be considered a cooking recipe to perform parametric inference based on block maxima.

  1. Suppose that we observe \((X_1,Y_1),\ldots,(X_N,Y_N)\).

  2. Choose a block size, say of length \(n\), and extract the block maximum for each component. Let \(\{Z_{1,1},\ldots,Z_{1,n}\}\) and \(\{Z_{2,1},\ldots,Z_{2,n}\}\) denote the collection of block maxima for \(X\) and \(Y\), respectively.

  3. Fit marginal GEV distributions to each component and obtain fitted GEV parameters \((\hat\mu_i,\hat\sigma_i\hat\xi_i)\), \(i = 1,2.\)

  4. Using the fitted parameters in step 3, transform the variables to a standard Fréchet scale.

  5. Test different bivariate extreme parametric model over \((\tilde{Z}_{1,1},\tilde{Z}_{2,1}),\ldots,(\tilde{Z}_{1,n},\tilde{Z}_{2,n})\) and choose a suitable one using both numerical and graphical (marginal and bivariate) diagnostic tools.

  6. Summarise the extremal dependence using \(\chi(u), \bar\chi(u)\), Pickands’ dependence function, etc. Compute return levels/curves and draw conclusions.

6.6 Applications

In this section we will test what we have learned about parametric bivariate models for extremes using simulations and real data.

6.7 Simulations

It is usually unclear what parametric model for \(G_H\) one should use, so it is important to test how robust the models are under model misspecification. Misspecified scenarios refer to cases where the fitted model does not belong to the family of probability distribution from which the data came. In the following exercise, we will check how robust some of the models listed above are when the data are simulated from a Bilogistic model (remember that in a read data scenario, we would not know this!).

Exercise 7. For this exercise, you can use the R package evd.

  1. For \(\alpha = 0.2\) and \(\beta = 0.8\), simulate \(n=100\) independent bivariate vectors from the Bilogistic model.
  2. Using the function the function fbvevd, fit the following models: Logistic, Coles and Tawn and Negative Bilogistic.
  3. Summarise the performance of each model with the deviance statistic and the AIC.
  4. Repeat the above \(N = 1000\) times.
  5. Report average deviance and AIC for each model and draw conclusions.
  6. Check average performance by plotting the Pickands’ dependence function of the true and fitted models. Use the average parameter estimates for each fitted model.
npar   = c(1, 2, 2) # numer of params for log, ct, negbilog
models = c('log', 'ct', 'negbilog')
N      = 15
dev    = aic = matrix(NA, N, length(npar))
plog   = rep(NA, N)
pct    = pnb = matrix(NA, N, 2)
for(i in 1:N){
  n        = 100
  data     = rbvevd(n, alpha = 0.2, beta = 0.8, model = 'bilog')
  fit.log  = fbvevd(data, model = models[1])
  fit.ct   = fbvevd(data, model = models[2])
  fit.nb   = fbvevd(data, model = models[3], std.err = F)
  plog[i]  = fit.log$estimate[7]
  pct[i,]  = fit.ct$estimate[7:8]
  pnb[i,]  = fit.nb$estimate[7:8]
  dev[i,]  = c(deviance(fit.log), deviance(fit.ct), 
              deviance(fit.nb))
  aic[i,]  = c(AIC(fit.log), AIC(fit.ct), AIC(fit.nb))
}

res = data.frame(model = models,'n par' = npar, AIC = colMeans(aic), Dev = colMeans(dev))
res
##      model n.par      AIC      Dev
## 1      log     1 598.1955 584.1955
## 2       ct     2 591.0992 575.0992
## 3 negbilog     2 588.5376 572.5376
which.min(res$AIC)
## [1] 3
which.min(res$Dev)
## [1] 3
# Diagnostic plots
abvevd(alpha = 0.2, beta = 0.8, model = 'bilog', plot = TRUE, 
       lwd = 2, ylim = c(0,1)) # true A(w)
abvevd(dep = mean(plog), model = models[1], plot = TRUE, add = TRUE,
       lwd = 2, col = 2, lty = 3)
abvevd(alpha = colMeans(pct)[1], beta = colMeans(pct)[2], model = models[2], 
       plot = TRUE, add = TRUE,
       lwd = 2, col = 3, lty = 3)
abvevd(alpha = colMeans(pnb)[1], beta = colMeans(pnb)[2], model = models[3], 
       plot = TRUE, add = TRUE,
       lwd = 2, col = 4, lty = 3)
legend('bottom', c('True', 'log', 'ct', 'negbilog'), col = 1:4, 
       lty = c(1,3,3,3), lwd = rep(2,4))

Note. The function fbvevd produces estimates not only for the dependence models but also for the marginal distributions.

6.8 Case studies

Using observed data, in this section we conduct a typical bivariate extremes analysis for stationary (Section 6.8.1) and non-stationary (Section 6.8.2) margins.

6.8.1 Stationary margins

We use the dataset rainfall from the SpatialExtremes packages that contains summer maximum daily rainfall during the period 1962-2008 at 79 locations in Switzerland. We chose the two locations highlighted in red and green squares in the top plot of Figure 9. The bottom figure shows time series plots for rainfall at both locations.
**Figure 10.** Two selected locations (top) and time series plots for rainfall at both locations (bottom).

Figure 10. Two selected locations (top) and time series plots for rainfall at both locations (bottom).

We first need to know the type of dependence class that we can assume for our bivariate data, so we plot empirical estimates of \(\chi(u)\) and \(\bar\chi(u)\). We also plot an empirical estimate of Pickand’s dependence function.
**Figure 11.** Dependence coefficients and Pickands' dependence function for the rainfall data illustrated in Figure 10.

Figure 11. Dependence coefficients and Pickands’ dependence function for the rainfall data illustrated in Figure 10.

It seems that \(\chi(u)>0\), although CIs are rather wide, so we cannot be 100% sure. It also seems that the Pickand’s dependence function is not far from symmetric. At the risk of overestimating the joint occurrence probability of extremes, we fit two bivariate extreme value models (one symmetric and the other one asymmetric) and compare their performance:

fit1  = fbvevd(rain.sub, model = 'log')
fit2  = fbvevd(rain.sub, model = 'bilog')

# Compare both models via A(w)
par(mfrow = c(1,1))
abvnonpar(data = rain.sub, plot = T, main = "Dependence function", ylim = c(0,1))
abvevd(dep = fit1$estimate[7], model = 'log', plot = TRUE, add = TRUE,
       lwd = 2, col = 2, lty = 3)
abvevd(alpha =fit2$estimate[7], beta =fit2$estimate[8], model = 'bilog', 
       plot = TRUE, add = TRUE, lwd = 2, col = 3, lty = 3)
legend('bottom', c('empirical','log', 'bilog'), col = 1:3, lty = c(1,2,2),
       lwd = rep(2,3))

which.min(c(AIC(fit1), AIC(fit2)))
## [1] 1
# Diagnostic for bivariate model
par(mfrow = c(2,3))
plot(fit1)

# Univariate diagnostic plots for first component
par(mfrow = c(2,2))
plot(fit1, mar = 1)

# Univariate diagnostic plots for second component
par(mfrow = c(2,2))
plot(fit1, mar = 2)

Remark. Note that the function fbvevd provides GEV fits for each margin. The function also accepts linear models for all marginal parameters.

6.8.2 Non-stationary margins

In practical applications, observing some type of non-stationarity in the marginal distributions is very common. This non-stationarity needs to be addressed before fitting a model for the bivariate dependence; otherwise, marginal effects can heavily influence conclusions regarding dependence. Take, for instance, our motivational example in Section 2 on negative log returns for CAC 40 and DAX 30. As we saw in Figure 2, both time series exhibit strong heteroskedasticity. To obtain stationary sequences, we can apply volatility filters. Figure 12 shows unfiltered negative returns and GARCH-filtered residuals. We can see that homoscedasticity can be safely assumed, and models for bivariate extremes can then be fitted. But we are not doing that just yet! This application calls for more refined techniques, which we will explore in Section 7.
**Figure 12.** Unfiltered daily negative returns and GARCH-filtered residuals for the time series data introduced in Section 2. Taken from Castro-Camilo and de Carvalho (2018).

Figure 12. Unfiltered daily negative returns and GARCH-filtered residuals for the time series data introduced in Section 2. Taken from Castro-Camilo and de Carvalho (2018).

6.9 Non- and semi-parametric modelling for bivariate extremes

The parametric approaches introduced in Section 6.1 are based on obtaining a small subset of the class of limiting distributions for \(G_H\) by building parametric sub-families of distributions for \(H\). Although parametric models may offer a good balance between model flexibility and analytical tractability, they are far from covering the general class of bivariate extreme value distributions, which, as we know, is infinite.

Non- and semi-parametric estimators offer a more flexible approach to model \(H\) at the cost of only relying on data (-ish; semi-parametric estimators combine both empirical information and modelling assumptions). To understand the origin of these estimators, we first need to recall a very important result related to the spectral distribution.

Consider the pseudo-polar transformation \[W_i = \frac{X_{i}}{X_i + Y_i},\quad R_i = X_i + Y_i,\quad i = 1,\ldots,N\] Let \(W\) and \(R\) be general terms of \(\{W_i\}_{i=1}^n\) and \(\{R_i\}_{i=1}^n\). de Haan and Resnick (1977) showed that the spectral distribution is the asymptotic distribution of the ratios \(W\) given that the sum \(R\) is large, i.e., \[\text{Pr}(W\in \cdot \mid R>\tau)\to H(\cdot), \quad \tau\to\infty.\]

The above result gives us with a straightforward method to estimate \(H\), as \(\{W_i\mid R_i>\tau\}\) can be used as an approximated sample from \(H\), for large \(\tau\). In practice, we do not observe the pseudo-polar coordinates, but we can construct proxies by setting \[\hat{W}_i = \frac{\hat{X}_{i}}{\hat{X}_i + \hat{Y}_i},\quad \hat{R}_i = \hat{X}_i + \hat{Y}_i,\quad i = 1,\ldots,N,\] where \(\hat{X}\) and \(\hat{Y}\) have standard Fréchet margins. In the literature, three estimators of the form \[\tilde{H}_l(w) = \sum_{i\in I_N}\tilde{p}_{li}\mathbf{1}\{\hat{W}_i\leq w\},\qquad w\in[0,1],\quad l\in\{1,2,3\}\] have been proposed, and they distinguish themselves in the way the weights \(\tilde{p}_{li}\) are defined:

  • Empirical spectral measure estimator (Einhmahl et al., 2001): the weights are \(\tilde{p}_{1i} = 1/n,\) where \(n = I_N\) is the number of extreme observations among the original \(N\) observations. This estimator does not necessarily satisfy the mean constraint in \(\eqref{eq:meanc}\).

  • Maximum empirical likelihood estimator (Einhmahl and Segers, 2009): the weights \(\tilde{p}_{2i}\) are the solution to the optimisation problem \[\max_{p\in\mathbb{R}_n^+}\sum_{i\in I_n}\log p_{2,i},\qquad\text{s.t. }\sum_{i\in I_N}p_{2,i} = 1\quad\text{and}\quad\sum_{i\in I_N}\hat{W}_ip_{2,i} = \frac{1}{2}.\] This estimator satisfies the mean constraint by construction, but it is not easy to compute since its expression is not free of Lagrange multipliers, which also makes asymptotic theory less manageable.

  • Euclidean likelihood estimator (de Carvalho et al., 2013): the weights \(\tilde{p}_{3i}\) are the solution to the optimisation problem \[\max_{p\in\mathbb{R}_n^+}-\frac{1}{2}\sum_{i\in I_n}\log (np_{3,i} - 1)^2,\qquad\text{s.t. }\sum_{i\in I_N}p_{3,i} = 1\quad\text{and}\quad\sum_{i\in I_N}\hat{W}_ip_{3,i} = \frac{1}{2}.\] This quadratic optimization problem with linear constraints can be solved explicitly with the method of Lagrange multipliers, yielding \[\tilde{p}_{3,i} = \frac{1}{n}\{1-(\overline{W} - 1/2)S_W^{-2}(W_i - \overline{W})\},\quad i\in I_N,\] where \(\overline{W}\) and \(S_W^{-2}\) denote the sample mean and sample variance of \(\{\hat{W}_1,\ldots,\hat{W}_n\}\).

Einhmahl and Segers (2009) and de Carvalho et al. (2013) showed that that \(\tilde{H}_1\) and \(\tilde{H}_2\) are more efficient than \(\tilde{H}_3\). Moreover, asymptotically, there is no difference between the \(\tilde{H}_1\) and \(\tilde{H}_2\). Figure 13 shows typical trajectories of the three nonparametric estimators where the data have been generated from a bivariate Logistic model. We can see that the performance of the empirical spectral measure is better for lower values of \(\alpha\) (i.e., when the model is closer to complete dependence), although in both cases the maximum Euclidean and empirical likelihood estimators perform better. We can also see the closeness of the maximum Euclidean and empirical likelihood estimators.

**Figure 13.** Trajectories of the empirical measure (dotted line), the maximum empirical likelihood estimator (dashed line), and the maximum Euclidean likelihood estimator (solid line). The solid grey line corresponds to the true spectral measure of the bivariate logistic model with dependence parameters 0.4 (left) and 0.8 (right).

Figure 13. Trajectories of the empirical measure (dotted line), the maximum empirical likelihood estimator (dashed line), and the maximum Euclidean likelihood estimator (solid line). The solid grey line corresponds to the true spectral measure of the bivariate logistic model with dependence parameters 0.4 (left) and 0.8 (right).

By construction, the three estimators of the spectral measure are discrete. A smooth version which still obeys the mean constraint can be obtained by smoothing the maximum Euclidean or empirical likelihood estimator using kernel smoothing techniques. For the Euclidean empirical likelihood estimator, de Carvalho et al. (2013) propose a smooth estimator of the spectral density by combining beta distributions with the weights \(\tilde{p}_{3,i}\). The so-called smooth Euclidean spectral density estimator is defined as \[\tilde{h}(w) = \sum_{i=1}^n\tilde{p}_{3,i}\beta\{w;W_i\nu,(1-W_i)v\},\quad w\in(0,1),\] where \(\nu>0\) is a concentration parameter, asymptotically inversely proportional to the variance of the beta kernel, and with the main role of controlling the amount of smoothing. The corresponding spectral distribution can be easily computed via integration (although it depends on the regularised incomplete beta function). The following code partially reproduces Figure 7 in de Carvalho et al. (2013). In this paper, the authors study the extremal dependence between temperatures observed at two very closed locations, one in the open air and the other one in the forest cover. Extreme radii \(R_i\) are those exceeding the \(98\%\) empirical quantile. For simplicity, the smoothness parameter \(\nu\) was set to 163, the value used in the original 2013 article, obtained via cross-validation.

library(extremis)
library(evd)
data(beatenberg)
par(mfrow = c(1,2), mar = c(4,4,2,1))
# Fitting spectral density and distribution
fit.h = angdensity(beatenberg, tau = 0.98, nu = 163, raw = FALSE)
fit.H = angcdf(beatenberg, tau = 0.98, raw = FALSE)
plot(fit.H, lwd = 2, col = 2)
title('Spectral distribution')
rug(fit.H$w, lwd = 2)

# Getting the histogram
Y = beatenberg
R = rowSums(Y)
W = Y$x/R
Rtau = quantile(R, 0.98)
W = W[R>Rtau]
hist(W, freq = F, ylim = c(0,3), main = 'Spectral density')
lines(fit.h$grid,fit.h$h, col = 2, lwd = 2)
**Figure 14.** *Left*: estimated spectral distribution. *Right*: histogram of pseudo-angles and fitted smooth Euclidean spectral density.

Figure 14. Left: estimated spectral distribution. Right: histogram of pseudo-angles and fitted smooth Euclidean spectral density.

7 Regression models for bivariate extremes

[Notation disclaimer: in this section, \(x\) represents a covariate, \(x\in\mathcal{X}\)]

Regression models help us analysing relationships between variables, make predictions, and understand trends. Regression models for univariate extremes can be easily constructed applying the framework of generalised (additive) linear models, whenever possible\(^{\boldsymbol{(3)}}\). In this case, we assume that GEV parameters \(\mu, \sigma\) and \(\xi\) (location, scale and shape) can be described in terms of covariates and link functions \(g_j\), \(j=1,2,3\): \[\mu(\boldsymbol{x}) = g_1(\boldsymbol{x}),\quad \sigma(\boldsymbol{x}) = g_2 (\boldsymbol{x}), \quad \xi(\boldsymbol{x}) = g_3(\boldsymbol{x}),\] with suitable restrictions over the link functions to match the parameter space \(\mathbb{R}\times \mathbb{R}^+\times \mathbb{R}\).

\(^{\boldsymbol{(3)}}\): Special care needs to be taken when dealing with heavy-tailed distributions. In some cases, the population moments may not be finite, and the techniques for GLMs/GAMs cannot be used for statistical analysis.

A natural way to extend the regression framework to the bivariate setting is to model the spectral distribution/density as a function of a covariate. The object of interest is therefore the family of densities \(\{h_x(w) :w\in[0,1], x\in \mathcal{X}\}\). In the parametric setting, extensions of the models introduced in Section 6.1 are constructed assuming that the parameters are covariate-dependent. For instance, a covariate-dependent Logistic regression model can be expressed in terms of its so-called spectral surface as \[\begin{equation}\label{eq:spectSurf} h_x(w) = \frac{1}{2}(\alpha_x^{-1}-1)\{w(1-w)\}^{-1-1/\alpha_x}\{w^{-1/\alpha_x} + (1-w)^{-1/\alpha_x}\}^{\alpha_x-2},\quad \alpha_x\in(0,1]\text{ for each }x. \end{equation}\] Spectral surfaces with complex dynamics can be based on \(\alpha_x = \mathcal{A}(x)\), where \(\mathcal{A}:\mathcal{X}\rightarrow\mathbb{R}\) is a continuous function. Figure 15 shows the Logistic spectral surface for \(\alpha_x = \Phi(x), x\in[\Phi^{-1}(0.2),\Phi^{-1}(0.4)]\), where \(\Phi(\cdot)\) is the cdf of the standard Gaussian distribution.

**Figure 15.** Spectral surface from a logistic family. Figure taken from Castro-Camilo and de Carvalho (2018).

Figure 15. Spectral surface from a logistic family. Figure taken from Castro-Camilo and de Carvalho (2018).

Although their construction is straightforward, there doesn’t seem to be any implementation of parametric bivariate regression models for extremes. On the contrary, in the semi-parametric setting, different techniques have been proposed to model extremal dependence structures where covariates can be incorporated (see, e.g., de Carvalho and Davison, 2014; Castro-Camilo and de Carvalho, 2017; Mhalla et al., 2019). Here, I will introduce one approach proposed by Castro-Camilo and de Carvalho (2018) based on the smooth Euclidean spectral density introduced in Section 6.9. Assume observations \(\{(X_i, W_i)\}_{i=1}^n\), where the covariates \(X_i\) are continuous and in \(\mathcal{X} \subseteq \mathbb{R}\). Let \(K_b(x)=(1/b)K(x/b)\) be a kernel with bandwidth \(b>0\). For any \(x \in \mathcal{X}\), we define the estimator \[\begin{equation}\label{eq:h*} \widehat{h}_x(w) = \sum_{i=1}^n \pi_{b,i}(x) \beta(w; \nu W_{i} \theta_b(x) + \tau, \nu\{1-W_{i} \theta_b(x)\} + \tau), \quad w \in (0,1), \end{equation}\] where \[\begin{align*} \theta_b(x) = \frac{1/2}{\sum_{i=1}^n \pi_{b,i}(x) W_{i}}, \quad \pi_{b,i}(x) = \frac{K_b(x - X_i)}{\sum_{j=1}^n K_b(x-X_j)}, \quad i = 1,\ldots, n. \end{align*}\] The moment constraint is satisfied, since \[\begin{align*} \int_0^1 w \widehat{h}_x(w) \, \text{d} w = \frac{\sum_{i=1}^n K_b(x-X_i)\{\nu W_{i} \theta_b(x) + \tau\}}{(\nu + 2\tau)\sum_{i=1}^n K_b(x-X_i)} = \frac{\nu/2+\tau}{\nu+2\tau}=1/2, \end{align*}\] for all valid \(\tau \geq 0\), upon substitution of \(\theta_b(x)\).

TFigure 16 shows spectral surface estimates based on the estimator in \(\eqref{eq:h*}\) (left) for Logistic spectral surface (right) in \(\eqref{eq:spectSurf}\).
**Figure 16.** Spectral surface from a logistic family and estimation using the smooth estimator in (5). Figure taken from Castro-Camilo and de Carvalho (2018)

Figure 16. Spectral surface from a logistic family and estimation using the smooth estimator in (5). Figure taken from Castro-Camilo and de Carvalho (2018)

To finalise, let’s return to our motivational example on the extremal dependence of joint losses between CAC 40 and DAX 30, characterised through their joint negative log returns (see Figure 2). As mentioned in Section 6.8.2, we filtered both time series using GARCH models due to the strong heteroskedasticity. Figure 17 shows the estimated spectral surfaced fitted to these GARCH-filtered residuals. We can clearly see the change from weaker dependence around 1990 to strong dependence starting from 2005, suggesting that in recent decades, there has been an increase in the extremal dependence in the losses for these leading European stock markets.

**Figure 17.** Spectral surface from a logistic family and estimation using the smooth estimator in (5). Figure taken from Castro-Camilo and de Carvalho (2018)

Figure 17. Spectral surface from a logistic family and estimation using the smooth estimator in (5). Figure taken from Castro-Camilo and de Carvalho (2018)

8 Concluding remarks

Bivariate extreme value theory is the first step in studying extremes of two or more processes. The theory is well-developed, and statistical models are still being proposed to tackle different challenges. The methods presented here are not comprehensive; first of all, we only focused on componentwise maxima, ignoring other representations, such as bivariate threshold excess models and point process representation. Additionally, not all the models shown in Section 6 can be scaled to higher dimensions (non-parametric methods in higher dimensions are questionable). Still, even very complex multivariate or spatial problems can be approached via a bivariate approach. Sometimes, bivariate is all we can do! Likelihood-based inference for a \(k\)-dimensional max-stable random vector remains challenging. The number of terms in the likelihood is the \(k\)-th Bell number, which grows super-exponentially. This is why methods to approximate the likelihood, such as pairwise likelihood, are so prevalent in extremes even today (a quick Google Scholar search on “pairwise likelihood for extremes” can show you many examples). On top of that, bivariate summary measures (such as \(\chi\), which can also be extended to higher dimensions) are still widely used as exploratory and validation tools.

Statistical modelling of uni/bi/multi-variate and spatial extremes is a very active area of research. New methods are usually accompanied by codes or even R packages. If you are interested in R packages for extremes, Christophe Dutang created this comprehensive list of packages for extreme value analysis on CRAN. Additionally, this paper offers a guide to extreme value software.

More on regression models for bivariate extremes (Section 7) will be available in Chapter 9 of the Handbook on Statistics of Extremes(2024+).

9 References

Abdous, B., & Ghoudi, K. (2005). Non-parametric estimators of multivariate extreme dependence functions. Nonparametric Statistics, 17(8), 915-935.

Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. L. (2006). Statistics of extremes: theory and applications. John Wiley & Sons.

Castro Camilo, D., & de Carvalho, M. (2017). Spectral density regression for bivariate extremes. Stochastic environmental research and risk assessment, 31, 1603-1613.

Castro-Camilo, D., de Carvalho, M., & Wadsworth, J. (2018). Time-varying extreme value dependence with application to leading European stock markets. Annals of Applied Statistics, 12 (1) 283 - 309

Coles, S. G. (2001). An Introduction to the Statistical Modeling of Extreme Values. London: Springer

De Carvalho, M., Oumow, B., Segers, J., & Warchoł, M. (2013). A Euclidean likelihood estimator for bivariate tail dependence. Communications in Statistics-Theory and Methods, 42(7), 1176-1192.

De Carvalho, M., & Davison, A. C. (2014). Spectral density ratio models for multivariate extremes. Journal of the American Statistical Association, 109(506), 764-776.

De Haan, L., & Resnick, S. I. (1977). Limit theory for multivariate sample extremes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 40(4), 317-337.

Einmahl, J. H., Piterbarg, V. I., & De Haan, L. (2001). Nonparametric estimation of the spectral measure of an extreme value distribution. The Annals of Statistics, 29(5), 1401-1423.

Einmahl, J. H., & Segers, J. (2009). Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. The Annals of Statistics, 2953-2989.

Genest., C., & Segers, J. (2009). Rank-based inference for bivariate extreme-value copulas. Ann. Statist. 37 (5B) 2990 - 3022.

Heffernan, J. E., & Tawn, J. A. (2004). A conditional approach for multivariate extreme values. Journal of the Royal Statistical Society. Series B:Statistical Methodology,66(3), 497–546

Mhalla, L., de Carvalho, M., & Chavez‐Demoulin, V. (2019). Regression‐type models for extremal dependence. Scandinavian Journal of Statistics, 46(4), 1141-1167.

Murphy‐Barltrop, C. J. R., Wadsworth, J. L., & Eastoe, E. F. (2023). New estimation methods for extremal bivariate return curves. Environmetrics, e2797.

Schmidt, R., & Stadtmüller, U. (2006). Non‐parametric estimation of tail dependence. Scandinavian journal of statistics, 33(2), 307-335.

Wadsworth, J. L., & Tawn, J. A. (2013). A new representation for multivariate tail probabilities. Bernoulli, 19(5B), 2689–2714.

These notes were created by Daniela Castro-Camilo inspired by many different resources (some of them cited here), including lecture notes by Raphaël Huser for course STAT380 at KAUST.