This paper presents a multivariate general Pareto distribution (MGPD) method and builds a method for solving MGPD through the use of a Monte Carlo simulation for marine environmental extreme-value parameters. The simulation method has proven to be feasible in the analysis of the joint probability of wave height and its concomitant wind from a hydrological station in the South China Sea (SCS). The MGPD is the natural distribution of the multivariate peaks-over-threshold (MPOT) sampling method, and is based on the extreme-value theory. The existing dependence functions can be used in the MGPD, so it may describe more variables which have different dependence relationships. The MGPD method improves the efficiency of the extremes in raw data. For the wave and the concomitant wind from a period of 23 years (1960–1982), the number of the wave and wind selected is averaged to 19 per year. For the joint conditional probability of the MGPD, the relative error is rather small in the Monte Carlo simulation method.

Introduction

Statistical modeling of extreme values (EV) plays a crucial role in design and risk evaluation in ocean engineering, and multivariate extreme-value distributions have been extensively developed over the last decades. This has been shown by several studies (Morton and Bowers, 1996; Sheng, 2001; Yang and Zhang, 2013). Problems concerning ocean environmental extremes are often multivariate in character. An example of this is that ocean environments (including waves, wind and currents) all contribute to the forces experienced by offshore systems during typhoons. Thus the severity of such a typhoon event may be described by a function of both wind speed peak and concomitant wave height. When the force of a system is dominated by both wind and concomitant wave, it may be sufficient to employ a 50-year return wave and 50-year return wind as a design criterion. However, the 50-year return wind and 50-year return wave do not occur at the same time. Therefore, any simple analysis assuming a perfect correlation between the wind and waves is likely to overestimate the design value (Morton and Bowers, 1996). Therefore, analyzing the encounter probability among the ocean environments by means of the multivariate distribution can offer useful reference in evaluating a project's safety and cost.

In multivariate EV theory, two sampling methods – the block maxima method and the peaks-over-thresholds (POT) method – have been developed. These methods respectively correspond to two natural distributions: multivariate EV distribution (MEVD) and multivariate general Pareto distribution (MGPD). MEVD is the natural distribution of the block maxima of all components. A typical example is that a block is a year and the block maxima are the annual maxima. MGPD is the theoretical distribution of the multivariate peaks-over-threshold (MPOT) method, in which the sample includes all values which are larger than a suitable threshold. Rootzén and Tajvidi (2006), based on the research of Tajvidi (1996), suggest that MGPD should be characterized by the following few properties: (i) exceedances (of suitably coordinated levels) asymptotically have a multivariate GP distribution if and only if componentwise maxima asymptotically are EV distributed,(ii) the multivariate GP distribution is the only one which is preserved under (a suitably coordinated) change in exceedance levels. The MPOT method has a high utilization rate of raw data and a more stable calculation result; as a result it has recently become widely used. The study of Morton and Bowers (1996) is based on the response function with wave and wind speed of anchoring semi-submersible platforms, enabling analysis of extreme anchorage force and corresponding wave height and wind speed through the use of logical extreme-value distribution. In that study, the authors do not use the natural distribution of the MPOT method, MGPD, to fit samples but instead bivariate extreme-value distribution to fit the POT samples. Coles and Tawn (1994) also follow the same idea. MGPD theory has improved greatly in recent years, but the definition of MGPD still needs further research. Bivariate threshold methods were developed by Joe et al. (1992) and Smith (1994) based on point process theory. MGPD has been the focus of certain studies, and further detail about it can be found in Rootzén and Tajvidi (2005), Tajvidi (1996), Beirlant et al. (2005) and Falk et al. (2004).

However, due to difficulties regarding MGPD in the solving procedure, (in general, with the dimension increased, the calculated quantity and complexity also rapidly increase), the application of MGPD in ocean engineering has been restricted. The use of Monte Carlo simulation is feasible to solve these problems because it only changes inner product operation and the complexity of the algorithm does not increase as dimension decreases. Liu et al. (1990) use the Monte Carlo simulation for the design of offshore platforms, and practical examples prove its fast calculation speed and high precision in compound extreme-value distribution. Philippe (2000) presents a new parameter estimation method of bivariate extreme-value distribution that uses Monte Carlo simulation. Shi (1999) presents a Monte Carlo method from a simple trivariate nested logistic model. Stephenson (2003) gives methods for simulating from symmetric and asymmetric versions of the multivariate logistic distribution, and compares many of the Monte Carlo simulation methods of multidimensional extreme distribution.

We have developed a procedure to handle the application of MGPD in marine engineering design. This paper uses the Monte Carlo simulation to solve the MGPD equation, and is structured as follows. The Monte Carlo method is introduced in Sect. 2. Fundamental to the application of MGPD is the choice of the optimal joint threshold and the estimation of the joint density: these aspects, including a case study, are discussed in Sect. 3. Finally, in Sect. 4, the advantages of MGPD and its Monte Carlo simulation are outlined.

Monte Carlo simulation of MGPD MGPD theory

The MGPD method is based on the extreme-value theory, and has been widely used around the world. Generalized extreme value distribution (GEVD) is the theoretical distribution of all variation block maxima. GPD describes the properties of extremes of all variations over threshold after declustering, the so-called POT distribution. Based on the relationship of GPD and GEVD, H(x)=1+log⁡(G(x)),log⁡(G(x))>-1, the distribution function of MGPD can be deduced: W(X)=1+log⁡H(x1,,xd)=1+i=1dxiDxii=1dxi,,xd-1i=1dxilog⁡G(x1,,xd)>-1, where (x1,,xd)=xU, U is a neighborhood of zero in the negative quadrant (-,0)d, D is the Pickands dependence function in the unit simplex Rd-1 on the domain of definition, and Rd={x[0,)d|i=1dxi=1} ; H(x1,,xn) is a multivariate extreme-value distribution function whose marginal distribution is a negative exponential distribution (detail in René, 2007). Thus MGPD has a variety of different types of distribution functions (Coles et al., 1991) because of different Pickands' dependence functions. The logistic dependence function is simple to use and has favorable statistical properties, and it is widely used in hydrology, finance and other fields. The bivariate logistic GPD is Drt1,,td-1=i=1d-1tir+1-i=1d-1tir1/r,Wr(x)=1-i=1d(-xi)r1/r=1-xr, where r is the correlation parameter of dependence function and r>1. x and y, in the interval (-1, 0), are variables of standardization. The density function is w(x,y)=Wxy=(r-1)(xy)r-1(-x)r+(-y)r1/r-2x<0,y<0. In this paper, we will always consider MGPD W with uniform margins. The marginal distributions of x and y are always transformed into a GPD with uniform margins by a suitable marginal transformation (René, 2007). Before the transformation, the distributions of x and y are F(x; r 1) and F(x; r 2), where the parameter r 1 and r 2 will also be evaluated. These parameters r, r 1 and r 2 can be evaluated by means of the following method: first, r 1 and r 2 are evaluated, and then they are introduced into the MGPD W for estimation of r. Alternatively, it can be evaluated by means of a global method: estimation of the parameters by using the maximum likelihood for the density function w(x, y; r, r 1, r 2). The global method evaluated results as more reliable due to the final function form concerned, but the processes of evaluation are more complex. The maximum-likelihood function is L(r)=i=1nln⁡wr(xi,yi).

Simulation method

Using polar coordinates to better demonstrate the simulated method of MGPD, Tp(x1,,xd)=(x1x1++xd,,xd-1x1++xd,x1++xd)=(z1,,zd-1,c), where Tp is the change of the vector (x1,,xd) into a polar coordinates. C=x1+xd and Z=(x1/C,,xd-1/C) are radial component and angular components, respectively; these are referred to as the Pickands polar coordinate.

In the Pickands polar coordinate, W(X) presents different properties. Let us assume that (X1,,Xd) follows MGPD W(X) and that its Pickands dependence function D exists as a d-order differential. We define the Pickands density of H(X) as ϕ(z,c)=cd-1dx1,,xdHTp-1z,c. If we assume that μ=Rd-1φ(z)dz>0 and constant c0<0 exist in a neighborhood of zero, then the simulation method of MGPD is as follows: (1) generate uniform random numbers on unit simplex Rd-1; (2) generate random vector z1,,zd based on the density function f(z)=φ(z)μ of Z=(z1,,zd-1) in the Pickands polar coordinate combined with the acceptance–rejection method; (3) generate uniform random numbers on (c0,0); and (4) calculate vector cz1,,czd-1,c-ci=1d-1zi, which is a random vector for satisfying the multivariate over-threshold distribution.

The c0 above is the joint threshold in the MGPD method. This paper determines the threshold following the principle of Coles and Tawn (1994).

Joint probability distribution

With the development of offshore engineering, joint probability study for extreme sea environments such as wind, waves, tides and streams is beginning to receive much more attention. API (American Petroleum Institute), DNV (DET NORSKE VERITAS) and so on were not proposed an explicit method as design criteria for marine structures although they made some relevant rules. API (1995) suggests three options, one of which is “Any `reasonable' combination of wind speed, wave height, and current speed that results in the 100-year return period combined platform load”. The joint return period of two variables needs to be considered for the probability of encounter between variables. Conditional probability can represent the probability of encounter between the extreme value of main marine environmental elements and the extreme value of its simultaneous marine environmental elements. For example, the probability of a 50-year return wave and a 50-year return wind speed occurring simultaneously at the same place is very small. Therefore, it is critical to use conditional probability to describe the probability of their joining together and analyze the effect of all kinds of marine environmental elements with regard to engineering.

The joint distribution of bivariate Pareto distribution function W(x,y) is W(x,y)=Pr(X<x,Y<y). Wx(x) and Wy(y) are marginal distributions of x and y, respectively. Conditional extreme-value distribution can be as follows:

Conditional probability 1: Pr(Xx|Yy)=Pr(Xx,Yy)Pr(Yy)=1-WX(x)-WY(y)+W(x,y)1-WY(y); conditional probability 2: Pr(Xx|Yy)=Pr(Xx,Yy)Pr(Yy)=WX(x)-W(x,y)1-WY(y); conditional probability 3: Pr(Xx|Yy)=Pr(Xx,Yy)Pr(Yy)=WY(y)-W(x,y)WY(y); conditional probability 4: Pr(Xx|Yy)=Pr(Xx,Yy)Pr(Yy)=W(x,y)WY(y). Another four conditional probability distributions can be deduced by swapping two variables.

Case study Sample selection of over-threshold values and marginal distribution

The raw data of the paper are wave height and synchronous wind speed observed four times a day over 23 years from an ocean hydrological station in the South China Sea (SCS). In the sample, the maximum winds reached 40 ms-1 and the maximum wave height was 8.50 m. The extreme wind speed and its corresponding wave height are selected as research samples. The sample of the over-threshold method is from the extreme value of blocks, and the principle behind declustering is to maintain sample independence. In the SCS, typhoons occur frequently and are the cause of almost all extreme wind speeds and wave heights. Generally, a typhoon may last several days or 1 week in the SCS, and so this paper declusters by 5-day intervals, taking the maximum value of a block. If the interval between two extremes is less than 2 days, then we need to delete smaller values from the samples in order to keep independence. Except for some individual processes of the storm which last a long time, most of the data meet the requirements of independence.

After the sample has been fixed and completed according to the requirements of independence, 1436 groups of extreme wind speed and corresponding wave height are selected. Their marginal distributions can be described by means of GEVD. GEVD includes three types of extreme-value distribution, and both marginal distributions use three variables for GEVD in this paper. F(x)=P(X<x)=exp⁡-1-ξx-μσ1/ξ,ξ0, where ξ,σ and μ are the three variables of GEVD; these are estimated by means of a maximum-likelihood estimate. Figure 1 shows the probability plot of marginal distribution. For the annual maximum of wind speed and wave height, a Pearson type III distribution is used to obtain return period values of wind speed and wave height in one dimension (see Fig. 2). The Pearson type III distribution is F(x)=P(X<x)=βαΓ(α)-x(x-μ)α-1exp⁡[-β(x-μ)]dx.

The joint probability distribution

The bivariate logistic generalized Pareto distribution was selected, and the data were converted to negative exponential distribution in the interval (-1,0) due to the active interval of the method being (-,0). The MGPD model of the paper is based on multivariate extreme-value distribution; the joint threshold can be calculated using the method developed by Coles and Tawn (1994). The joint threshold is c0=-0.7, and there are 450 groups of the combination of wind speed and wave height over c0. Figure 3a shows the samples of over-threshold values. In the left-hand panel of Fig. 3a, c0=-0.7 is a curve, and the right side of the curve shows values over the threshold. In the right-hand panel of the Fig. 3a, c0=-0.7 is a line, and the area to the top right of the line represents over-threshold values the converted data in polar coordinates. The joint distribution is shown in Fig. 3b.

Comparison of stochastic simulation results

Figure 4 shows the over-threshold values and the data of stochastic simulation by N=50 000 and N=100 000, respectively.

The simulation results are in agreement with the actual situation, showing that the MGPD method was successful. The scatter diagrams show the results directly, but they require further quantitative analysis in order to show the differences of them objectively. A couple of the conditional probabilities mentioned above are used in this paper: (1) P(H>h|V>v) and (4) P(H<h|V<v), which mean (1) the probability of the wave height over h under the wind speed over v and (4) the probability of the wave height less than h under the wind speed less than v, respectively. Both of these actually respond to the probability of extreme-value wave height and its corresponding wind speed joint occurrence.

Figure 5 shows the calculation of the conditional probability P(H>h|V>v) by group h=7.99m and v=37ms-1. The Monte Carlo method is used to calculate its conditional probability through the result of the simulation based on the definition of conditional distribution. As is shown in Fig. 5, the difference values of the simulation and the model are related to the simulation times N. A relative difference value has been reduced with the increase in simulation times. When the simulation times are up to 2×106, the relative error value of simulation results and calculation results is 0.1 %, which shows that the error of simulation results is acceptable.

Based on the results with simulation times 2×106, Tables 2 and 3 represent the calculation results of two different conditional probabilities.

Tables 2 and 3 show, for five groups, the calculation and stochastic simulation results of conditional probability 1 and 4 for different combinations of wave height and wind speed. In the two tables, probability 20, 10, 5, 2 and 1 % represent 5, 10, 20, 50 and 100-year return wave heights and wind speeds, respectively, which are obtained by the annual maximum value and Pearson type III distribution. The results for the calculation and the stochastic simulation are similar. For instance, the calculated result of the probability of greater than the 10-year return value (7.08 m) for wave height when wind speed is greater than the 10-year return value (37 m s-1) is 94.67 %, whereas the stochastic simulation result of the same conditional probability is 94.44 %. The relative error is 0.24 %. For conditional probability 4, the calculated result of the probability of less than the 5-year return value (4.65 m) for wave height when wind speed is less than the 50-year return value (43.41 m s-1) is 98.78 %, whereas the stochastic simulation result of the same conditional probability is 95.73 %. The relative error is 3.09 %. Synthesizing all conditional probabilities among the different extreme sea environments is beneficial for finding a balance between investment and risk with regard to engineering, and can provide a scientific basis for pre-estimates of risk.

Conclusions

This paper presents the theoretical method of MGPD, which is based on existing multivariate extreme-value distribution and can describe various dependence relationships among different extreme-value variables. The model is based on the theory of extreme values, which is well founded, and the intrinsic properties of all extreme variables are taken into consideration.

Through analysis of conditional probability, the Monte Carlo method of MPOT has only small errors, and provides a solution for the analysis of multivariate and complex cases, and thus the technique shows promise for future use.

Conditional probability includes the probability of extreme events being encountered, and provides the theoretical basis for finding the best balance point between engineering cost and risk.

The model of MGPD has the ability to describe the probability of multivariate extreme-event occurrence at the same time. A larger sample size than traditional annual extreme-value methods allows for the extreme features of the raw data to be maintained as best as possible.