Bivariate extreme models are a powerful statistical tools used to analyse extreme values of two related variables simultaneously. These models are particularly valuable in environmental sciences, finance, and risk management, where understanding the joint behaviour of extreme events is essential. By modeling the dependence structure, bivariate extreme models allows us to assess the joint tail behaviour, quantify risk, and make informed decisions.
The statistical analysis of bivariate extreme helps us understand the joint behaviour of extreme occurrences, which univariate analysis cannot capture, leading to more accurate risk assessments. To showcase this, Figure 1 shows the closing daily pricing of two leading European stock markets, CAC 40 and DAX 30, from January 1, 1988, to January 1, 2014. We can see there is a clear relationship between the behaviour of these stock markets over the years:
For all the problems introduced above, we need to characterise the distribution of two random variables in the joint (upper) tail region. By definition, there is not much data in the joint tail, so we cannot use empirical representations of the distribution of joint extremes. Bivariate (and more generally, multivariate) extreme value theory (EVT) provides strong mathematical arguments for calculating joint tail probabilities. The main focus of multivariate EVT is in the dependence structure; as we can see from the examples above, if extremes occur together, the overall risk increases.
This lecture focuses on componentwise maxima, i.e., the random vector constructed from taking the maximum of each component of a bivariate vector. Specifically, in Section 4, we will study the limiting behaviour of the componentwise maxima vector. In Section 5, we will learn measures of extremal dependence that are very useful for statistical modelling. Section 6 will use the limiting representation in Section 4 to construct parametric and semi-parametric models for bivariate componentwise maxima. In Section 7, I will talk about regression for bivariate extremes and conclude in Section 8 with comments on the limitations of the methods, alternative representations for bivariate/multivariate extremes and R packages for analysis of extremes.
Let \((X_1,Y_1),(X_2,Y_2),\ldots\) be a sequence of independent and identically distributed continuous random vectors with distribution \(F_{XY}(x,y)\). Let \(F_X(x)\) and \(F_Y(y)\) be the marginal distributions of \(X\) and \(Y\), respectively. We define the vector of componentwise maxima as \[\boldsymbol{M}_n = (M_{x,n},M_{y,n}) = \left(\max_{i=1,\ldots,n}{X_i},\max_{i=1,\ldots,n}{Y_i}\right).\] The index \(j\) for which the maximum of the \(X_i\) sequence occurs is not necessarily the same index \(k\) for which the maximum of the \(Y_i\) sequence occurs. In other words, \(\boldsymbol{M}_n\) might not correspond to an actual observation. To see this, the following R code generates \(N\times n = 20\times100=2,000\) samples from \(X\) and \(Y\) (\(X\perp Y\)) following Gaussian distributions and construct the componentwise maxima vector using blocks of size \(n=100\). We can see that only one componentwise maxima coincide with the bivariate observations.
As in the univariate case, the asymptotic theory of bivariate extremes starts by studying the limiting behaviour of \(\boldsymbol{M}_n\). Since \(X\) and \(Y\) are independent, we know that the three-types theorem applies to \(M_{x,n}\) and \(M_{y,n}\) (if the necessary normalising sequences exist such that the limiting distribution of renormalised maxima is non-degenerate). When characterising the limiting behaviour of random vectors, it is standard practice to assume some common marginal distribution for each component. By means of the probability integral transform theorem\(^{\boldsymbol{(1)}}\), we can always assume virtually any distribution we want, and here we assume that both \(X\) and \(Y\) follow a standard Fréchet distribution, i.e., \[F_X(z) = F_Y(z) = F_\text{Fréchet}(z)= \exp(-1/z),\quad z>0.\] In that case, it is possible to show that the random variables \[\begin{equation}\label{eq:cw_smaxima} M_{x,n}^\star = \frac{\max_{i=1,\ldots,n}{X_i}}{n},\quad M_{y,n}^\star = \frac{\max_{i=1,\ldots,n}{Y_i}}{n} \end{equation}\] also follow a standard Fréchet distribution.
Optional material
Read the following if you want to know more about the use of the probability integral transform to obtain standard Fréchet random variables. If not interested, skip to the next section.
\(^{\boldsymbol{(1)}}\): The
probability integral transform states that if \(Z\) has continuous distribution \(F_Z\), then \(U =
F_Z(Z)\) follows a uniform distribution. So we can transform
\(Z\) to a random variable \(W\) with the desired standard Fréchet
distribution by defining \(W =
F_\text{Fréchet}^{-1}(U)\). Indeed, note that \[F_{W}(z) = \text{Pr}(W \leq z)
= \text{Pr}(F_\text{Fréchet}^{-1}(U) \leq z) = \text{Pr}(U \leq
F_\text{Fréchet}(z)) = F_\text{Fréchet}(z),\] in other words,
the cdf of \(W\) is \(F_\text{Fréchet}\) (the Fréchet
distribution). In real-data applications, we don’t usually know \(F_Z\), so a common practice is to use its
empirical distribution (ecdf
). The following example
illustrates how we can use the above results:
G = function(x){
stopifnot(x>0)
exp(-1/x)
}
Ginv = function(p){
if(p>0 & p<1)
return(-1/log(p))
else
return(NA)
}
z = rnorm(100, sd = 10)
U = ecdf(z)(z) # we assume we don't know that z is normally distributed
W = sapply(U, Ginv)
# Does W follows a std Frechet distribution? H0: W ~ stdF
library(evd)
ks.test(W, pfrechet)
Let \(\boldsymbol{M}_n^\star = (M_{x,n}^\star, M_{y,n}^\star)\) with \(M_{x,n}^\star\) and \(M_{y,n}^\star\) defined as in \(\eqref{eq:cw_smaxima}\). Then if \[\text{Pr}(M_{x,n}^\star\leq x, M_{y,n}^\star\leq y)\overset{\mathcal{d}}\rightarrow G(x,y),\] where \(G\) is a non-degenerate distribution, then \(G\) is called the bivariate extreme value distribution and has the form \[G_H(x,y) = \exp\{-V_H(x,y)\},\quad x,y> 0,\] where \[V_H(x,y) = 2\int_0^1\max\left(\frac{w}{x},\frac{1-w}{y}\right)\text{d}H(w),\] is called exponent measure and \(H\) is the spectral distribution function defined in \([0,1]\) satisfying the mean constraint \[\begin{equation}\label{eq:meanc} \int_0^1w\text{d}H(w) = \frac{1}{2}. \end{equation}\]
Remarks
Note that the subscript in \(G_H\) indicates that there is a family of limiting distributions (indexed by \(H\)). Since \(H\) is a distribution, the above means that there are infinite possible limits. This is very different from the univariate case where the family of GEV distributions can be characterised by a finite number of parameters.
Assuming standard Fréchet margins is a choice, meaning that other margins can be assumed. In general, we can apply increasing affine transformations to each margin and consider the sequence of random vectors \[\left(\frac{M_{x,n}-b_{x,n}}{a_{x,n}},\frac{M_{y,n}-b_{y,n}}{a_{y,n}}\right).\] As in the univariate case, the specific form of the normalising constant \(a_{x,n}, b_{x,n},a_{y,n}\) and \(b_{y,n}\) will depend on the distribution of the original variables \(X_i, Y_i\). The particular choice is not so important from an asymptotic point of view, but some useful properties can be more easily studied from a specific selection. Popular choices include exponential, Gumbel, uniform, and Weibull distributions. Examples of other margins employed in the literature can be found in Beirlant et al. (2004).
If \(H\) is differentiable, the spectral density is denoted by \(h(w) = \frac{\text{d}H(w)}{\text{d} w}\).
All the information about extremal dependence is contained in the spectral distribution \(H\) (or equivalently in the measure \(V\)). Another equivalent way to characterise it is through the so-called Pickands’ dependence function \(A\) defined as \[\begin{equation}\label{eq:pickands} A_H(w) = V_H({w^{-1},(1-w)^{-1}}),\quad w\in[0,1]. \end{equation}\] We can see from \(\eqref{eq:pickands}\) that there is a one-to-one mapping between \(A\), \(V\) and \(H\). Indeed, we can express \(V\) as \[V_H(x,y) = \left(\frac{1}{x}+\frac{1}{y}\right)A_H\left(\frac{y}{x+y}\right),\quad x,y>0.\] Additionally, \(V\) and \(A\) have the following properties:
Exercise 1. Show the max-stability property: \({G_H(tx, ty)}^t = G_H(x,y)\), for \(x,y,t>0\).
Solution. \[{G_H(tx, ty)}^t = \exp\{-tV_H(tx,ty)\} = \exp\{-V_H(x,y)\} = G_H(x,y)\] The second equality is due to the homogeneity of order -1 propoperty of \(V_H\).
The main goal of bivariate extremes is to characterise the extremal dependence between variables. Before we further describe extremal dependence, let’s study the two boundary cases of dependence: independence and perfect dependence.
Solution. \[\begin{align*} V_H(x,y) &= 2 \max\left(\frac{0.5}{x},\frac{0.5}{y}\right) = \max\left({x}^{-1},{y}^{-1}\right)// G_H(x,y) &= \exp\{-\max\left({x}^{-1},{y}^{-1}\right)\}\\ A_H(w) &= V_H(w^{-1},(1-w)^{-1}) = \max\left(w,1-w\right) \end{align*}\]
Between these two boundary cases, the extremes of \(X\) and \(Y\) can exhibit different types of asymptotic (in)dependence. In the following section, we provide easily interpretable coefficients to characterise the level of extremal dependence of pairs of variables.
One way to measure the strength of extremal dependence is through the limiting conditional probability \[\chi = \lim_{u\to 1}\text{Pr}\{F_X(X)>u\mid F_Y(Y)>u\}.\] Intuitively, \(\chi\) is the probability that \(X\) is extreme given that \(Y\) is extreme. Clearly, \(0\leq\chi\leq1\) and we have the following cases:
If \(\chi = 0\), the variables \(X\) and \(Y\) are said to be asymptotically independent (AI).
If \(\chi > 0\), the variables \(X\) and \(Y\) are said to be asymptotically dependent (AD). Moreover, the value of \(\chi\) increases with the strength of dependence at extreme levels.
If \(\chi > 0\) extremes can occur together in the limit, and if \(\chi = 0\) they cannot. But \(X\) and \(Y\) can still be dependent at any finite levels (even very high ones!).
In practical applications, we approximate \(\chi\). Let \[\chi(u) = 2 - \frac{\log\text{Pr}\{F_X(X)\leq u, F_Y(Y)\leq u\}}{\log u}.\] It can be shown that \(\chi = \lim_{u\to1}\chi(u)\) and that \(2-\log(2u-1)/\log u\leq \chi(u)\leq 1\). Note that \(\chi(u)\) can be negative, and its sign determines whether \(X\) and \(Y\) are negatively or positively associated at the level \(u\).
When \(X\) and \(Y\) are asymptotically independent, \(\chi\) and \(\chi(u)\) fail to provide any measure of the level of extremal dependence. So for AI variables, we have the quantity \[\bar\chi(u) = \frac{2\log(1-u)}{\log\text{Pr}\{F_X(X)>u,F_Y(Y)>u\}}-1.\] The limit of \(\bar\chi(u)\) when \(u\to1\) is denoted \(\bar\chi\) and we have that \(-1\leq\bar\chi\leq1.\) For AI variables, \(\bar\chi\) increases with strength of dependence at extreme levels.
We saw in Section 5 that the bivariate extreme value distribution \(G_H\) arises as the unique limiting family of distributions for the properly renormalised vector of componentwise maxima. But since there is a one-to-one map between the spectral distribution \(H\) and \(G_H\), there are infinite possible forms for \(G_H\). Any distribution function on \([0,1]\) satisfying the mean constraint \(\eqref{eq:meanc}\) gives rise to a valid limit.
Exercise 5. Can you think of an example of a distribution function defined on \([0,1]\) that has mean 1/2?An example is the Beta distribution where both shape parameters are equal.
So how can we use this very powerful and elegant mathematical result for statistical modelling? Well, in general, there are three possible strategies:
One of the main challenges with non-parametric and semi-parametric techniques is the need to restrict the estimators to satisfy the mean constraint in \(\eqref{eq:meanc}\). Non-parametric techniques were widely explored in the early and mid 2000s, remarkably using the copula approach. The interested reader can refer to Abdous and Ghoudi (2005), Schmidt and Stadtmüller (2006), Genest and Segers (2009) and references therein. In Section 6.1 we review different parametric models proposed in the literature. Sections 6.2-6.8 are devoted to statistical modelling assuming parametric forms. Section 6.9 summarises semi-parametric approaches.
The goal here is to use parametric sub-families of distributions for either \(H, V\) or \(A\). This means that \(G_H\) can be now described through a finite set of parameters, which greatly simplifies inference, at the cost of reducing model flexibility. In the following, provide a (non-exhastive) list of parametric models that have been proposed thoughout the years. First, a bit of notation is needed. Let \(z_1\) and \(z_2\) be general terms of the vector of componentwise maxima and define \[ \tilde{z}_i= \{1+\xi_i(z_i - \mu_i)/\sigma_1\}^{-1/\xi_i},\quad i=1,2,\] where \((\mu_i,\sigma_i,\xi_i)\) are the GEV parameters for \(z_i\), \(i=1,2\). In each of the bivariate distributions functions below, the univariate margins are GEV, so \(G(z_i) = \exp\{-\tilde{z}_i\}\), for \(i=1,2\).
Logistic model. One of the simplest parametric bivariate models with only one dependence parameter: \[G_H(z_1,z_2) = \exp\left\{-\left(\tilde{z}_1^{1/\alpha} + \tilde{z}_2^{1/\alpha}\right)^{\alpha}\right\}, \quad \alpha\in(0,1]\]
Asymmetric logistic model. An extension of the previous model with two additional parameters that allow for asymmetry in the dependence structure: \[G_H(z_1,z_2) = \exp\left\{(1-\psi_1)\tilde{z}_1 + (1-\psi_2)\tilde{z}_2 - \left[\left(\tilde{z}_1{\psi_1}\right)^{1/\alpha} + \left(\tilde{z}_2{\psi_2}\right)^{1/\alpha}\right]^\alpha\right\},\quad \alpha\in(0,1], \psi_1,\psi_2\in [0,1]\]
Hüsler-Reiss model. For \(a>0\), \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1\Phi\left\{\frac{1}{a}+\frac{a}{2}\log\left(\frac{\tilde{z}_1}{\tilde{z}_2}\right)\right\} - \tilde{z}_2\Phi\left\{\frac{1}{a}+\frac{a}{2}\log\left(\frac{\tilde{z}_2}{\tilde{z}_1}\right)\right\}\right\}\]
Bilogistic model. Another extension to the logistic model: \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1q^{1-\alpha} - \tilde{z}_2(1-q)^{1-\beta}\right\},\quad\alpha,\beta>0,\] where \(q\) is the solution of \((1-\alpha)\tilde{z}_1(1-q)^{\beta} - (1-\beta)\tilde{z}_2q^\alpha = 0.\) If \(\alpha = \beta\), the Biligistic model reduces to the Logistic one.
Negative bilogistic model. The negative bilogistic distribution function with parameters \(\alpha>0\) and \(\beta>0\) is \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1 - \tilde{z}_2 + \tilde{z}_1q^{1+\alpha} + \tilde{z}_2(1-q)^{1+\beta}\right\},\] where \(q\) is the solution to the equation \((1+\alpha)\tilde{z}_1q^\alpha - (1-\beta)\tilde{z}_2(1-q)^\beta = 0\)
Coles-Tawn model. The Coles-Tawn distribution function with parameters \(\alpha>0\) and \(\beta>0\) is \[G_H(z_1,z_2) = \exp\left\{-\tilde{z}_1[1-\text{Be}(q;\alpha+1,\beta)] - \tilde{z}_2\text{Be}(q;\alpha,\beta+1)\right\},\] where \(q = \alpha\tilde{z}_2/(\alpha \tilde{z}_2 + \beta\tilde{z}_1)\) and \(\text{Be}(q;\alpha,\beta)\) is the beta distribution function evaluated at \(q\) with shape parameters \(\alpha\) and \(\beta\).
Exercise 6.
Using the function abvevd
from the package
evd
, plot the Pickands’ dependence function for the
following models and parameters configuration:
Logistic: \(\alpha=0.3, 0.6, 0.9\)
Asymmetric logistic: \(\psi_1=0.2,\psi_2=0.8, \alpha = 0.4\); \(\psi_1=\psi_2=0.5, \alpha = 0.4\); and \(\psi_1=0.8,\psi_2=0.2, \alpha = 0.4\)
Husler-Reiss: \(a=1,2,3\)
Use a single plot for each model. You should end up with three plots,
each containing three different lines. Note that abvevd
adds two reference lines indicating the lower and upper bounds of the
Pickands’ dependence function.
Solution.
par(pty = 's', mfrow = c(1,3), mar = c(3,3,3,1))
# Logistic
abvevd(dep = 0.3, model = "log", plot = TRUE, main = 'Logistic', lwd = 2, ylim = c(0,1))
abvevd(dep = 0.6, model = "log", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 0.9, model = "log", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom',
c(expression(paste(alpha, " = ", 0.3)),
expression(paste(alpha, " = ", 0.6)),
expression(paste(alpha, " = ", 0.9))),
col = 1:3, lwd = rep(2,3))
# Asymmetric logistic
abvevd(dep = 0.4, asy = c(0.2, 0.8), model = "alog", plot = TRUE, main = 'Asymmetric logistic', lwd = 2, ylim = c(0,1))
abvevd(dep = 0.4, asy = c(0.5, 0.5), model = "alog", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 0.4, asy = c(0.8, 0.2), model = "alog", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom',
c(expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.2, " ", psi[2], "=", 0.8)),
expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.5, " ", psi[2], "=", 0.5)),
expression(paste(alpha, " = ", 0.4, " ", psi[1], "=", 0.8, " ", psi[2], "=", 0.2))),
col = 1:3, lwd = rep(2,3))
# Husler-Reiss
abvevd(dep = 1, model = "hr", plot = TRUE, main = 'Husler-Reiss', lwd = 2, ylim = c(0,1))
abvevd(dep = 2, model = "hr", plot = TRUE, add = T, col = 2, lwd = 2)
abvevd(dep = 3, model = "hr", plot = TRUE, add = T, col = 3, lwd = 2)
legend('bottom', c('a = 1', 'a = 2', 'a = 3'), col = 1:3, lwd = rep(2,3))
It can be shown that the density associated to the bivariate extreme value distribution \(G_H\) is \[g_H(z_1,z_2) = \frac{\partial^2 G_H(z_1,z_2)}{\partial z_1\partial z_2} = \{V_{1}(z_1,z_2)V_2(z_1,z_2) - V_{12}(z_1,z_2)\}\exp\{-V(z_1,z_2)\},\quad z_1,z_2>0,\] where the subscripts in \(V_1, V_2\) and \(V_{12}\) denote partial differentiation of \(V_H\) with respect to \(z_1\) and/or \(z_2\). If \((\boldsymbol{z}_{1},\boldsymbol{z}_{2}) = ((z_{1,1},z_{2,1}),\ldots,(z_{1,n},z_{2,n}))\) is a sample of \(G\) of size \(n\), the log-likelihood is given by \[\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2}) = \sum_{i=1}^n\log\{V_{1}(z_{1,i},z_{2,i})V_2(z_{1,i},z_{2,i}) - V_{12}(z_{1,i},z_{2,i})\} - V(z_{1,i},z_{2,i}),\] where \(\theta\) is the vector that parametrises \(G_H\) (for instance, if \(G_H\) is assumed to be the bivariate Logistic model, then \(\theta = \alpha\)). Solutions to the likelihood equations \(\frac{\partial\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2})}{\partial\theta}\mid_{\theta=\hat\theta} = 0\) do not usually have closed form, so the maximum likelihood estimator is obtained by numerically maximising \(\ell(\theta;\boldsymbol{z}_{1},\boldsymbol{z}_{2})\).\(^{\boldsymbol{(2)}}\)
\(^{\boldsymbol{(2)}}\): Since we are talking about the likelihood equations, we are implicitly assuming that they are well defined, i.e., the parametric models fo \(G\) are well-behaved. In particular, the support of the distribution does not depend on the parameter vector \(\theta\).
A return level for a bivariate extreme value distribution can be constructed as follows. We know that any bivariate extreme value distribution can be represented thorugh the Pickands’ dependence function as \[G(x,y) = \exp\left\{-\left(\frac{1}{\tilde{x}} + \frac{1}{\tilde{y}}\right)A(w)\right\},\] where \(\tilde{x}\) and \(\tilde{y}\) correspond, respectively, to \(x\) and \(x\) transformed to be unit Frechet distributed, and \(w = \frac{y}{(x+y)}\in[0,1].\) Then, for a fixed probability \(p\) and \(w\), the equation \(G(x,y) = p\) leads to \[\tilde{x} = -\frac{A(w)}{w\log p} \text{ and } \tilde{y} = \frac{w\tilde{x}}{1-w}.\]
We can then easily write \(\tilde{x}, \tilde{y}\) in their original \(x,y\) scale.
More generally, we can define the \(p\)-probability return curve as the set \[\text{RC}(p) = \{(x,y)\in\mathbb{R}^2:\text{Pr}(X>x,Y>x) = p\},\] where \(\text{Pr}(X>x,Y>x)\) is called the joint survival function (see, e.g., Murphy-Barltrop and Wadsworth, 2023). We are of course interested in values of \(p\) close to 0, which correspond to rare joint exceedance events. Similarly, we can define the return period to be \(1/p\) since given any point in RC(\(p\)), we would expect to observe the event \(\{X>x, Y>y\}\) once, on average, each return period. Note that since return curves define a line in \(\mathbb{R}^2\), the two dimensional return level plot cannot be naturally extended in this setting. But one way to visualise return curves proposed by Murphy-Barltrop and Wadsworth (2023) is by considering different return periods and plotting the corresponding curves individually (as in the left plot in Figure 9) or simultaneously (as in the right plot in Figure 9). Figure 9 shows, on the left, a return curve for 1000 bivariate standard normal observations with correlation 0.5 for return period of 100 (\(p = 1/100\)). For a sample of size \(n = 1000\), we expect to observe \(np = 10\) points in the blue shaded region. On the right, Figure 9 shows return curves of the same bivariate normal data set for return periods in the set {10,100, 1000, 10,000} In their article, Murphy-Barltrop and Wadsworth (2023) proposed a novel way to estimate extremal bivariate return curves using methods based on Heffernan and Tawn (2004) and Wadsworth and Tawn (2013).
Most graphical diagnostic tools used in the univariate case can be extended to the bivariate case. A non-comprehensive list of diagnostic plots is given below.
Conditional PP plots (probability-probability plots), conditioning on each margin.
Bivariate density plot.
Plot of Pickands’ dependence function (\(A(w)\)).
Quantile curves plot.
Plot of spectral density \(h(w)\)
Additionally, a plot of the empirical and model-based measures \(\chi(u)\) and \(\bar\chi(u)\) is helpful to check how well the model captures extremal dependence. You should include model-based confidence/credible intervals to the \(\chi(u)\) and \(\bar\chi(u)\) plots whenever possible. Finally, a good diagnostic approach should include checks for marginal fits (marginal qqplots, PP plots, density plots, etc).
The following can be considered a cooking recipe to perform parametric inference based on block maxima.
Suppose that we observe \((X_1,Y_1),\ldots,(X_N,Y_N)\).
Choose a block size, say of length \(n\), and extract the block maximum for each component. Let \(\{Z_{1,1},\ldots,Z_{1,n}\}\) and \(\{Z_{2,1},\ldots,Z_{2,n}\}\) denote the collection of block maxima for \(X\) and \(Y\), respectively.
Fit marginal GEV distributions to each component and obtain fitted GEV parameters \((\hat\mu_i,\hat\sigma_i\hat\xi_i)\), \(i = 1,2.\)
Using the fitted parameters in step 3, transform the variables to a standard Fréchet scale.
Test different bivariate extreme parametric model over \((\tilde{Z}_{1,1},\tilde{Z}_{2,1}),\ldots,(\tilde{Z}_{1,n},\tilde{Z}_{2,n})\) and choose a suitable one using both numerical and graphical (marginal and bivariate) diagnostic tools.
Summarise the extremal dependence using \(\chi(u), \bar\chi(u)\), Pickands’ dependence function, etc. Compute return levels/curves and draw conclusions.
In this section we will test what we have learned about parametric bivariate models for extremes using simulations and real data.
It is usually unclear what parametric model for \(G_H\) one should use, so it is important to test how robust the models are under model misspecification. Misspecified scenarios refer to cases where the fitted model does not belong to the family of probability distribution from which the data came. In the following exercise, we will check how robust some of the models listed above are when the data are simulated from a Bilogistic model (remember that in a read data scenario, we would not know this!).
Exercise 7. For
this exercise, you can use the R package evd
.
fbvevd
, fit the
following models: Logistic, Coles and Tawn and Negative Bilogistic.npar = c(1, 2, 2) # numer of params for log, ct, negbilog
models = c('log', 'ct', 'negbilog')
N = 15
dev = aic = matrix(NA, N, length(npar))
plog = rep(NA, N)
pct = pnb = matrix(NA, N, 2)
for(i in 1:N){
n = 100
data = rbvevd(n, alpha = 0.2, beta = 0.8, model = 'bilog')
fit.log = fbvevd(data, model = models[1])
fit.ct = fbvevd(data, model = models[2])
fit.nb = fbvevd(data, model = models[3], std.err = F)
plog[i] = fit.log$estimate[7]
pct[i,] = fit.ct$estimate[7:8]
pnb[i,] = fit.nb$estimate[7:8]
dev[i,] = c(deviance(fit.log), deviance(fit.ct),
deviance(fit.nb))
aic[i,] = c(AIC(fit.log), AIC(fit.ct), AIC(fit.nb))
}
res = data.frame(model = models,'n par' = npar, AIC = colMeans(aic), Dev = colMeans(dev))
res
## model n.par AIC Dev
## 1 log 1 598.1955 584.1955
## 2 ct 2 591.0992 575.0992
## 3 negbilog 2 588.5376 572.5376
## [1] 3
## [1] 3
# Diagnostic plots
abvevd(alpha = 0.2, beta = 0.8, model = 'bilog', plot = TRUE,
lwd = 2, ylim = c(0,1)) # true A(w)
abvevd(dep = mean(plog), model = models[1], plot = TRUE, add = TRUE,
lwd = 2, col = 2, lty = 3)
abvevd(alpha = colMeans(pct)[1], beta = colMeans(pct)[2], model = models[2],
plot = TRUE, add = TRUE,
lwd = 2, col = 3, lty = 3)
abvevd(alpha = colMeans(pnb)[1], beta = colMeans(pnb)[2], model = models[3],
plot = TRUE, add = TRUE,
lwd = 2, col = 4, lty = 3)
legend('bottom', c('True', 'log', 'ct', 'negbilog'), col = 1:4,
lty = c(1,3,3,3), lwd = rep(2,4))
Note. The function fbvevd
produces
estimates not only for the dependence models but also for the marginal
distributions.
Using observed data, in this section we conduct a typical bivariate extremes analysis for stationary (Section 6.8.1) and non-stationary (Section 6.8.2) margins.
rainfall
from the
SpatialExtremes
packages that contains summer maximum daily
rainfall during the period 1962-2008 at 79 locations in Switzerland. We
chose the two locations highlighted in red and green squares in the top
plot of Figure 9. The bottom figure shows time series plots for rainfall
at both locations.
It seems that \(\chi(u)>0\), although CIs are rather wide, so we cannot be 100% sure. It also seems that the Pickand’s dependence function is not far from symmetric. At the risk of overestimating the joint occurrence probability of extremes, we fit two bivariate extreme value models (one symmetric and the other one asymmetric) and compare their performance:
fit1 = fbvevd(rain.sub, model = 'log')
fit2 = fbvevd(rain.sub, model = 'bilog')
# Compare both models via A(w)
par(mfrow = c(1,1))
abvnonpar(data = rain.sub, plot = T, main = "Dependence function", ylim = c(0,1))
abvevd(dep = fit1$estimate[7], model = 'log', plot = TRUE, add = TRUE,
lwd = 2, col = 2, lty = 3)
abvevd(alpha =fit2$estimate[7], beta =fit2$estimate[8], model = 'bilog',
plot = TRUE, add = TRUE, lwd = 2, col = 3, lty = 3)
legend('bottom', c('empirical','log', 'bilog'), col = 1:3, lty = c(1,2,2),
lwd = rep(2,3))
## [1] 1
Remark. Note that the function fbvevd
provides GEV fits for each margin. The function also accepts linear
models for all marginal parameters.
The parametric approaches introduced in Section 6.1 are based on obtaining a small subset of the class of limiting distributions for \(G_H\) by building parametric sub-families of distributions for \(H\). Although parametric models may offer a good balance between model flexibility and analytical tractability, they are far from covering the general class of bivariate extreme value distributions, which, as we know, is infinite.
Non- and semi-parametric estimators offer a more flexible approach to model \(H\) at the cost of only relying on data (-ish; semi-parametric estimators combine both empirical information and modelling assumptions). To understand the origin of these estimators, we first need to recall a very important result related to the spectral distribution.
Consider the pseudo-polar transformation \[W_i = \frac{X_{i}}{X_i + Y_i},\quad R_i = X_i + Y_i,\quad i = 1,\ldots,N\] Let \(W\) and \(R\) be general terms of \(\{W_i\}_{i=1}^n\) and \(\{R_i\}_{i=1}^n\). de Haan and Resnick (1977) showed that the spectral distribution is the asymptotic distribution of the ratios \(W\) given that the sum \(R\) is large, i.e., \[\text{Pr}(W\in \cdot \mid R>\tau)\to H(\cdot), \quad \tau\to\infty.\]
The above result gives us with a straightforward method to estimate \(H\), as \(\{W_i\mid R_i>\tau\}\) can be used as an approximated sample from \(H\), for large \(\tau\). In practice, we do not observe the pseudo-polar coordinates, but we can construct proxies by setting \[\hat{W}_i = \frac{\hat{X}_{i}}{\hat{X}_i + \hat{Y}_i},\quad \hat{R}_i = \hat{X}_i + \hat{Y}_i,\quad i = 1,\ldots,N,\] where \(\hat{X}\) and \(\hat{Y}\) have standard Fréchet margins. In the literature, three estimators of the form \[\tilde{H}_l(w) = \sum_{i\in I_N}\tilde{p}_{li}\mathbf{1}\{\hat{W}_i\leq w\},\qquad w\in[0,1],\quad l\in\{1,2,3\}\] have been proposed, and they distinguish themselves in the way the weights \(\tilde{p}_{li}\) are defined:
Empirical spectral measure estimator (Einhmahl et al., 2001): the weights are \(\tilde{p}_{1i} = 1/n,\) where \(n = I_N\) is the number of extreme observations among the original \(N\) observations. This estimator does not necessarily satisfy the mean constraint in \(\eqref{eq:meanc}\).
Maximum empirical likelihood estimator (Einhmahl and Segers, 2009): the weights \(\tilde{p}_{2i}\) are the solution to the optimisation problem \[\max_{p\in\mathbb{R}_n^+}\sum_{i\in I_n}\log p_{2,i},\qquad\text{s.t. }\sum_{i\in I_N}p_{2,i} = 1\quad\text{and}\quad\sum_{i\in I_N}\hat{W}_ip_{2,i} = \frac{1}{2}.\] This estimator satisfies the mean constraint by construction, but it is not easy to compute since its expression is not free of Lagrange multipliers, which also makes asymptotic theory less manageable.
Euclidean likelihood estimator (de Carvalho et al., 2013): the weights \(\tilde{p}_{3i}\) are the solution to the optimisation problem \[\max_{p\in\mathbb{R}_n^+}-\frac{1}{2}\sum_{i\in I_n}\log (np_{3,i} - 1)^2,\qquad\text{s.t. }\sum_{i\in I_N}p_{3,i} = 1\quad\text{and}\quad\sum_{i\in I_N}\hat{W}_ip_{3,i} = \frac{1}{2}.\] This quadratic optimization problem with linear constraints can be solved explicitly with the method of Lagrange multipliers, yielding \[\tilde{p}_{3,i} = \frac{1}{n}\{1-(\overline{W} - 1/2)S_W^{-2}(W_i - \overline{W})\},\quad i\in I_N,\] where \(\overline{W}\) and \(S_W^{-2}\) denote the sample mean and sample variance of \(\{\hat{W}_1,\ldots,\hat{W}_n\}\).
Einhmahl and Segers (2009) and de Carvalho et al. (2013) showed that that \(\tilde{H}_1\) and \(\tilde{H}_2\) are more efficient than \(\tilde{H}_3\). Moreover, asymptotically, there is no difference between the \(\tilde{H}_1\) and \(\tilde{H}_2\). Figure 13 shows typical trajectories of the three nonparametric estimators where the data have been generated from a bivariate Logistic model. We can see that the performance of the empirical spectral measure is better for lower values of \(\alpha\) (i.e., when the model is closer to complete dependence), although in both cases the maximum Euclidean and empirical likelihood estimators perform better. We can also see the closeness of the maximum Euclidean and empirical likelihood estimators.
By construction, the three estimators of the spectral measure are discrete. A smooth version which still obeys the mean constraint can be obtained by smoothing the maximum Euclidean or empirical likelihood estimator using kernel smoothing techniques. For the Euclidean empirical likelihood estimator, de Carvalho et al. (2013) propose a smooth estimator of the spectral density by combining beta distributions with the weights \(\tilde{p}_{3,i}\). The so-called smooth Euclidean spectral density estimator is defined as \[\tilde{h}(w) = \sum_{i=1}^n\tilde{p}_{3,i}\beta\{w;W_i\nu,(1-W_i)v\},\quad w\in(0,1),\] where \(\nu>0\) is a concentration parameter, asymptotically inversely proportional to the variance of the beta kernel, and with the main role of controlling the amount of smoothing. The corresponding spectral distribution can be easily computed via integration (although it depends on the regularised incomplete beta function). The following code partially reproduces Figure 7 in de Carvalho et al. (2013). In this paper, the authors study the extremal dependence between temperatures observed at two very closed locations, one in the open air and the other one in the forest cover. Extreme radii \(R_i\) are those exceeding the \(98\%\) empirical quantile. For simplicity, the smoothness parameter \(\nu\) was set to 163, the value used in the original 2013 article, obtained via cross-validation.
library(extremis)
library(evd)
data(beatenberg)
par(mfrow = c(1,2), mar = c(4,4,2,1))
# Fitting spectral density and distribution
fit.h = angdensity(beatenberg, tau = 0.98, nu = 163, raw = FALSE)
fit.H = angcdf(beatenberg, tau = 0.98, raw = FALSE)
plot(fit.H, lwd = 2, col = 2)
title('Spectral distribution')
rug(fit.H$w, lwd = 2)
# Getting the histogram
Y = beatenberg
R = rowSums(Y)
W = Y$x/R
Rtau = quantile(R, 0.98)
W = W[R>Rtau]
hist(W, freq = F, ylim = c(0,3), main = 'Spectral density')
lines(fit.h$grid,fit.h$h, col = 2, lwd = 2)
[Notation disclaimer: in this section, \(x\) represents a covariate, \(x\in\mathcal{X}\)]
Regression models help us analysing relationships between variables, make predictions, and understand trends. Regression models for univariate extremes can be easily constructed applying the framework of generalised (additive) linear models, whenever possible\(^{\boldsymbol{(3)}}\). In this case, we assume that GEV parameters \(\mu, \sigma\) and \(\xi\) (location, scale and shape) can be described in terms of covariates and link functions \(g_j\), \(j=1,2,3\): \[\mu(\boldsymbol{x}) = g_1(\boldsymbol{x}),\quad \sigma(\boldsymbol{x}) = g_2 (\boldsymbol{x}), \quad \xi(\boldsymbol{x}) = g_3(\boldsymbol{x}),\] with suitable restrictions over the link functions to match the parameter space \(\mathbb{R}\times \mathbb{R}^+\times \mathbb{R}\).
\(^{\boldsymbol{(3)}}\): Special care needs to be taken when dealing with heavy-tailed distributions. In some cases, the population moments may not be finite, and the techniques for GLMs/GAMs cannot be used for statistical analysis.
A natural way to extend the regression framework to the bivariate setting is to model the spectral distribution/density as a function of a covariate. The object of interest is therefore the family of densities \(\{h_x(w) :w\in[0,1], x\in \mathcal{X}\}\). In the parametric setting, extensions of the models introduced in Section 6.1 are constructed assuming that the parameters are covariate-dependent. For instance, a covariate-dependent Logistic regression model can be expressed in terms of its so-called spectral surface as \[\begin{equation}\label{eq:spectSurf} h_x(w) = \frac{1}{2}(\alpha_x^{-1}-1)\{w(1-w)\}^{-1-1/\alpha_x}\{w^{-1/\alpha_x} + (1-w)^{-1/\alpha_x}\}^{\alpha_x-2},\quad \alpha_x\in(0,1]\text{ for each }x. \end{equation}\] Spectral surfaces with complex dynamics can be based on \(\alpha_x = \mathcal{A}(x)\), where \(\mathcal{A}:\mathcal{X}\rightarrow\mathbb{R}\) is a continuous function. Figure 15 shows the Logistic spectral surface for \(\alpha_x = \Phi(x), x\in[\Phi^{-1}(0.2),\Phi^{-1}(0.4)]\), where \(\Phi(\cdot)\) is the cdf of the standard Gaussian distribution.
Although their construction is straightforward, there doesn’t seem to be any implementation of parametric bivariate regression models for extremes. On the contrary, in the semi-parametric setting, different techniques have been proposed to model extremal dependence structures where covariates can be incorporated (see, e.g., de Carvalho and Davison, 2014; Castro-Camilo and de Carvalho, 2017; Mhalla et al., 2019). Here, I will introduce one approach proposed by Castro-Camilo and de Carvalho (2018) based on the smooth Euclidean spectral density introduced in Section 6.9. Assume observations \(\{(X_i, W_i)\}_{i=1}^n\), where the covariates \(X_i\) are continuous and in \(\mathcal{X} \subseteq \mathbb{R}\). Let \(K_b(x)=(1/b)K(x/b)\) be a kernel with bandwidth \(b>0\). For any \(x \in \mathcal{X}\), we define the estimator \[\begin{equation}\label{eq:h*} \widehat{h}_x(w) = \sum_{i=1}^n \pi_{b,i}(x) \beta(w; \nu W_{i} \theta_b(x) + \tau, \nu\{1-W_{i} \theta_b(x)\} + \tau), \quad w \in (0,1), \end{equation}\] where \[\begin{align*} \theta_b(x) = \frac{1/2}{\sum_{i=1}^n \pi_{b,i}(x) W_{i}}, \quad \pi_{b,i}(x) = \frac{K_b(x - X_i)}{\sum_{j=1}^n K_b(x-X_j)}, \quad i = 1,\ldots, n. \end{align*}\] The moment constraint is satisfied, since \[\begin{align*} \int_0^1 w \widehat{h}_x(w) \, \text{d} w = \frac{\sum_{i=1}^n K_b(x-X_i)\{\nu W_{i} \theta_b(x) + \tau\}}{(\nu + 2\tau)\sum_{i=1}^n K_b(x-X_i)} = \frac{\nu/2+\tau}{\nu+2\tau}=1/2, \end{align*}\] for all valid \(\tau \geq 0\), upon substitution of \(\theta_b(x)\).
TFigure 16 shows spectral surface estimates based on the estimator in \(\eqref{eq:h*}\) (left) for Logistic spectral surface (right) in \(\eqref{eq:spectSurf}\).To finalise, let’s return to our motivational example on the extremal dependence of joint losses between CAC 40 and DAX 30, characterised through their joint negative log returns (see Figure 2). As mentioned in Section 6.8.2, we filtered both time series using GARCH models due to the strong heteroskedasticity. Figure 17 shows the estimated spectral surfaced fitted to these GARCH-filtered residuals. We can clearly see the change from weaker dependence around 1990 to strong dependence starting from 2005, suggesting that in recent decades, there has been an increase in the extremal dependence in the losses for these leading European stock markets.
Bivariate extreme value theory is the first step in studying extremes of two or more processes. The theory is well-developed, and statistical models are still being proposed to tackle different challenges. The methods presented here are not comprehensive; first of all, we only focused on componentwise maxima, ignoring other representations, such as bivariate threshold excess models and point process representation. Additionally, not all the models shown in Section 6 can be scaled to higher dimensions (non-parametric methods in higher dimensions are questionable). Still, even very complex multivariate or spatial problems can be approached via a bivariate approach. Sometimes, bivariate is all we can do! Likelihood-based inference for a \(k\)-dimensional max-stable random vector remains challenging. The number of terms in the likelihood is the \(k\)-th Bell number, which grows super-exponentially. This is why methods to approximate the likelihood, such as pairwise likelihood, are so prevalent in extremes even today (a quick Google Scholar search on “pairwise likelihood for extremes” can show you many examples). On top of that, bivariate summary measures (such as \(\chi\), which can also be extended to higher dimensions) are still widely used as exploratory and validation tools.
Statistical modelling of uni/bi/multi-variate and spatial extremes is a very active area of research. New methods are usually accompanied by codes or even R packages. If you are interested in R packages for extremes, Christophe Dutang created this comprehensive list of packages for extreme value analysis on CRAN. Additionally, this paper offers a guide to extreme value software.
More on regression models for bivariate extremes (Section 7) will be available in Chapter 9 of the Handbook on Statistics of Extremes(2024+).
Abdous, B., & Ghoudi, K. (2005). Non-parametric estimators of multivariate extreme dependence functions. Nonparametric Statistics, 17(8), 915-935.
Beirlant, J., Goegebeur, Y., Segers, J., & Teugels, J. L. (2006). Statistics of extremes: theory and applications. John Wiley & Sons.
Castro Camilo, D., & de Carvalho, M. (2017). Spectral density regression for bivariate extremes. Stochastic environmental research and risk assessment, 31, 1603-1613.
Castro-Camilo, D., de Carvalho, M., & Wadsworth, J. (2018). Time-varying extreme value dependence with application to leading European stock markets. Annals of Applied Statistics, 12 (1) 283 - 309
Coles, S. G. (2001). An Introduction to the Statistical Modeling of Extreme Values. London: Springer
De Carvalho, M., Oumow, B., Segers, J., & Warchoł, M. (2013). A Euclidean likelihood estimator for bivariate tail dependence. Communications in Statistics-Theory and Methods, 42(7), 1176-1192.
De Carvalho, M., & Davison, A. C. (2014). Spectral density ratio models for multivariate extremes. Journal of the American Statistical Association, 109(506), 764-776.
De Haan, L., & Resnick, S. I. (1977). Limit theory for multivariate sample extremes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 40(4), 317-337.
Einmahl, J. H., Piterbarg, V. I., & De Haan, L. (2001). Nonparametric estimation of the spectral measure of an extreme value distribution. The Annals of Statistics, 29(5), 1401-1423.
Einmahl, J. H., & Segers, J. (2009). Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. The Annals of Statistics, 2953-2989.
Genest., C., & Segers, J. (2009). Rank-based inference for bivariate extreme-value copulas. Ann. Statist. 37 (5B) 2990 - 3022.
Heffernan, J. E., & Tawn, J. A. (2004). A conditional approach for multivariate extreme values. Journal of the Royal Statistical Society. Series B:Statistical Methodology,66(3), 497–546
Mhalla, L., de Carvalho, M., & Chavez‐Demoulin, V. (2019). Regression‐type models for extremal dependence. Scandinavian Journal of Statistics, 46(4), 1141-1167.
Murphy‐Barltrop, C. J. R., Wadsworth, J. L., & Eastoe, E. F. (2023). New estimation methods for extremal bivariate return curves. Environmetrics, e2797.
Schmidt, R., & Stadtmüller, U. (2006). Non‐parametric estimation of tail dependence. Scandinavian journal of statistics, 33(2), 307-335.
Wadsworth, J. L., & Tawn, J. A. (2013). A new representation for multivariate tail probabilities. Bernoulli, 19(5B), 2689–2714.
These notes were created by Daniela Castro-Camilo inspired by many different resources (some of them cited here), including lecture notes by Raphaël Huser for course STAT380 at KAUST.
5.3 Comments on asymptotic dependence and independence
The pair of coefficients \((\chi,\bar\chi)\) and their finite-sample counterparts, \((\chi(u),\bar\chi(u))\), are very useful to determine the type of asymptotic dependence:
\(\chi>0\) and \(\bar\chi=1\): AD \(\Rightarrow\) \(\chi\) reflects strength of extremal dependence.
\(\chi=0\) and \(\bar\chi<1\): AI \(\Rightarrow\) \(\bar\chi\) reflects strength of extremal dependence.
For statistical modelling, the type of asymptotic dependence assumed has a severe impact on extrapolation into the joint upper tail:
If AI is assumed, but the data are in fact AD, the joint occurrence probability of extremes will likely be underestimated.
If AD is assumed, but data are in fact AI, the joint occurrence probability of extremes will likely be overestimated.
In terms of risk management, assuming AD when data are AI might not be as bad as assuming AI when data are AD. In general, AD can be interpreted as a fairly conservative approach, which provides upper bounds on the joint occurrence probabilities of extremes.
Some known cases:
Exercise 3. The following figures show plots of \(\chi(u)\) and \(\bar\chi(u)\) for bivariate Gaussian, Student t and max-stable vectors. Can you recognise to what curve each model corresponds?
Figure 8. Example of dependence coefficients for different models.
Gaussian is red, student t is blue and max-stable is black.
Exercise 4. We can use the function
chiplot
in the libraryevd
to estimate \(\chi(u)\) and \(\bar\chi(u)\) non-parametrically. Check the implementation by generating \(N=2000\) copies of bivariate Gaussian vectors with correlation 0.6. Repeat the above now generating copies of zero-mean bivariate Student t vectors with 3 degrees of freedom and diagonal scale matrix.