An EMD-PSO-LSSVM hybrid model for significant wave 2 height prediction

. Accurate and significant wave height prediction with a couple of hours of warning time should offer major safety 12 improvements for coastal and ocean engineering applications. However, significant wave height phenomenon is nonlinear and 13 nonstationary, which makes any prediction simulation a non-straightforward task. The aim of the research presented in this paper 14 is to improve predicted significant wave height via a hybrid algorithm. Firstly, empirical mode decomposition (EMD) is used to 15 preprocess the nonlinear data, which are decomposed into several simple signals. Then, least square support vector machine 16 (LSSVM) with nonlinear learning ability is used to predict the significant wave height, and particle swarm optimization (PSO) is 17 implemented to automatically perform the parameter selection in LSSVM modeling. The EMD-PSO-LSSVM model is used to 18 predict the significant wave height for 1, 3 and 6 hours leading times of two stations in the offshore and deep-sea areas of the North 19 Atlantic Ocean. The results show that the EMD-PSO-LSSVM model can remove the lag in the prediction timing of the single 20 prediction models. Furthermore, the prediction accuracy of the EMD-LSSVM model that has not been optimized in the deep-sea 21 area has been greatly improved, an improvement of the prediction accuracy of Coefficient of determination (𝑹 𝟐 ) from 0.991, 0.982 22 and 0.959 to 0.


Introduction 25
Significant wave height prediction has many vital applications. For instance, it can improve the efficiency and safety of operations 26 in marine and offshore environments (Duan et al., 2016a). Installation of offshore wind turbines, cargo transfer between ships, sea 27 rescue and lifting and landing of helicopters or aircraft are other significant examples to mention (Richter et al., 2017). More 28 precisely, an accurate estimation of the significant wave height is relevant to characterize the wave energy production from Wave 29 Energy Converters (WECs) facilities (Cornejo-Bueno et al., 2018). Prediction information can help to provide motion 30 compensation, which may prevent the crash of cargo in cargo transfer, improve the firing accuracy of ship-borne weapon systems, 31 and performance of the motion control systems (Ra and Whang, 2006). 32 Over the past few years, several numerical methods have been developed to predict significant wave height, using either classical 33 statistical methods, the artificially intelligent techniques based on linear and nonlinear models, or hybrid models (Hwang,34 2006;Casas-Prat et al., 2014;Janssen, 2008). However, accurate prediction of significant wave height requires a large amount of 35 sensor-based data while the computational complexity of the calculations is still relatively high and requires high-performance 36 computers. Last but not least wave height predictions are still not always very accurate (Yoon et al., 2011;Browne et al., 2007). 37 With the development of machine learning, time series analysis provides an easy and computationally efficient solution that will 38 be mainly based on historical wave height data. Such modelling approaches will have the advantage of being relatively simple as 39 based on previous data and wave patterns, this avoiding a lot of computational costs. 40 Early research studies on wave prediction using machine learning employed classical time series models, such as the auto-41 regressive (AR) model, auto-regressive moving average (ARMA) model, an autoregressive integrated moving average (ARIMA) 42 model. Soares et al. (1996) applied AR models to describe time series of significant wave heights in two Portuguese coast locations. 43 Later, AR models have been further generalized from the application of univariate models of the long-term time series of significant 44 wave height to the case of the bivariate series of significant wave height and mean period (Guedes Soares and Cunha, 2000). 45 However, the prediction based on a single AR model in harsh conditions and large prediction leading time fails to satisfy the 46 expectations. To further improve the prediction performance, Agrawal and Deo (2002) adopted ARMA and ARIMA models to 47 predict the wave height for 3, 6, 12, and 24 hours of offshore location in India. Despite the high efficiency and adaptiveness of 48 classical time series models, prediction results in severe sea conditions are far from being accurate enough. Since waves are always 49 nonstationary, this conflicts with the linear and stationary classical time series models' assumptions. Overall, these approaches are 50 not suitable for predicting nonlinear and nonstationary waves. 51 In order to address the nonlinear component of ocean waves, intelligent-technique-based nonlinear models such as artificial 52 neural networks (ANNs) models have been extensively studied. Such methods can carry out nonlinear simulations without a deep 53 https://doi.org/10.5194/os-2021-2 Preprint. Discussion started: 28 January 2021 c Author(s) 2021. CC BY 4.0 License. understanding of the relationships between the input and output variables. Deo and Sridhar Naidu (1998)  term network for the quick prediction of significant wave height with higher accuracy than the conventional neural networks. 72 Significant wave height is a complicated, nonlinear, dynamic system, and it is impacted by various components (Valamanesh et  73 al., 2016). The time series prediction of non-stationary data by using the ANN method will lead to the homogenization of the 74 different characteristics of the original input data, which could affect the prediction accuracy. Accordingly, the non-stationarity of 75 the time series of significant wave height and input variables should be reduced. To handle nonstationary features, the inputs for 76 the corresponding data-driven models are need to be appropriately preprocessed. Hybrid models that combine preprocessing 77 techniques with single prediction models are alternatives for more effective modeling. The wavelet analysis is a useful tool that 78 can be used for nonstationary data (Rhif et  suitable (Huang and Wu, 2008). The other issue with wavelets is that they require defining a well-suited mother wavelet transform 85 a priori (Chen et al., 2012). It is still an unresolved issue and generally requires a lengthy trial and error process (Prasad et al.,86 2017). In hybrid prediction models, a more effective decomposition technique is needed to overcome nonlinearity and non-87 stationarity instantaneously. 88 In the study of nonlinear and nonstationary datasets, a data-driven methodology known as empirical mode decomposition (EMD) 89 is efficient and adaptive (Huang et al., 1998). The EMD multiresolution utility offers self-adaptability by avoiding the need for 90 any basis function and mother wavelets. It functions as a dyadic filter that divides a large frequency band complex signal into way for the short-term prediction of nonlinear and nonstationary waves. 94 Based on these recent findings, the research presented in this paper integrated the EMD-PSO with the LSSVM models in order 95 to improve prediction models' accuracy. LSSVM with nonlinear learning ability can be used for predictionEMD is an empirical 96 analysis tool used for processing nonlinear and nonstationary datasets. Preprocessing with EMD can reduce the difficulty of 97 prediction. and PSO is a swarm intelligence optimization algorithm by updating the distance between the current and best locations. 98 The important parameters of LSSVM are optimally adjusted by PSO to improve the prediction accuracy of a single LSSVM. 99 100 2 Methodology formulation 101

EMD-PSO-LSSVM prediction model 102
Ocean wave time series is a kind of complicated nonlinear and nonstationary signal composed of various oscillation scales. When 103 performing wave predictions, the different oscillation scales create difficulties for the LSSVM models. Integrating an EMD model 104 with an LSSVM model is an important way for enhancing the wave hright prediciton.. The EMD was adopted to decompose the 105 wave height series that consisted of one residual series and several intrinsic mode functions (IMFs). Then, the residual series and 106 IMFs were modeled by the LSSVM model, respectively, and finally the summation of the prediction outputs of subseries the wave 107 height prediction. Besides, the PSO was employed to optimize LSSVM parameters to increase prediction accuracy. The specific 108 steps of the EMD-PSO-LSSVM prediction algorithm are displayed in Fig. 1

Preprocess data by EMD 114
Empirical mode decomposition (EMD) is an empirical analysis tool used for processing nonlinear and nonstationary datasets. The 115 main idea of EMD is to decompose the nonlinear and nonstationary time series into a sum of several simple intrinsic mode function 116 (IMF) components and one residue with individual inherent time scale properties. Each IMF represents a kind of natural oscillatory 117 mode and has to satisfy the following two conditions. 118 (1) the number of extremes and the number of zero-crossings should be equal or differ by one, (2) and the local average should 119 be null, i.e., the mean of the upper envelope defined by the local maxima and the lower envelope defined by the local minima is 120 null. 121 With a given significant wave height time sequence ( ), EMD processing steps are summarized as follows.

Least square support vector machine (LSSVM) 132
Support vector machine (SVM) is a statistical learning theory-based method with a strong capacity to handle nonlinear problems. 133 Its basic idea is to map the nonlinear data into a high dimensional feature space using a nonlinear mapping function, where linear 134 techniques are available. LSSVM is the least squares formulation of a standard SVM. Unlike the inequality constrains introduced 135 in the standard SVM, LSSVM proposed equality constrains in the formulation. This makes the solution being transformed from 136 one of solving a quadratic program to a set of linear equations known as the linear Karush-Kuhn-Tucker (KKT) systems. LSSVM 137 is a nonlinear prediction model based on SVM theory, and it has been widely applied in short-term prediction problems. The 138 LSSVM is retained in this paper as it has good ability for data generalization. It has been shown that the results of a LSSVM model 139 in the prediction problem are also better than other nonlinear models. The basic idea of the method can be described as follows. 140 Given a training data set of points {( , ), = 1, 2, … , } with input data ∈ and output data ∈ . Define a nonlinear 141 mapping function to map the input data into the high dimensional feature space. In the high dimensional feature space, there 142 theoretically exists a linear function to express the nonlinear relationship between input and output data. Such a linear function, 143 namely the LSSVM function, can be defined as 144 where and are adjustable coefficients. The corresponding optimization problem for LSSVM is formulated as 146 https://doi.org/10.5194/os-2021-2 Preprint. Discussion started: 28 January 2021 c Author(s) 2021. CC BY 4.0 License.
where denotes the regularization constant and represents the training data error. 148 The Lagrangian is represented by 149 (4) 150 From the Karush-Kuhn-Tucker (KKT) conditions, the following equations must be satisfied 151 The solution is found by solving the system of linear equations expressed in the following matrix form 153 The LSSVM regression model becomes 157 where are the Lagrange multipliers that can be got by solving the dual problem and ( , ) is the kernel function that equals 159 the inner product of ( ) and ( ). 160 The most frequently used kernel functions are the polynomial kernel function, sigmoid kernel function, and radial basis kernel 161 function (RBF). Considering that the RBF kernel is not only easy to implement but also an efficient tool for dealing with nonlinear 162 problems, we selected and retained the RBF function, which is defined by the following equation: 163 It is well known that the efficiency of the LSSVM generalization (prediction accuracy) depends on a good collection of meta 165 parameters, parameters , , and parameters of the kernel. When the RBF function is selected, the parameters ( and ) must be 166 optimized using the PSO-LSSVM system. The regularization parameter and kernel parameter of LSSVM have a significant 167 influence on the classification accuracy. The choices of and govern the model complexity of the prediction. 168

LSSVM optimization by PSO 169
To avoid the under-fitting and over-fitting issues, the LSSVM model's hyper-parameters should be appropriately tuned. This paper 170 uses the particle swarm optimization (PSO) algorithm to find the best value of and in LSSVM. The LSSVM fitting process 171 optimized by particle swarm algorithm is shown in Fig. 2. the particle is as follows: where is the inertial weight; 1 and 2 are cognition and social learning factor respectively; 1 and 2 are two random numbers; 180 denotes the th iteration; is the position of the particle in -dimensional space, which denotes the current value of LSSVM 181 parameters and ; denotes the velocity of a particle in -dimensional space, which decides to update the direction and 182 distance of the next generation of and ; is the best position that every particle can be got during the execution of the PSO 183 method; is the best situation that particles have obtained during the implementation of the PSO method. 184 The following are some parameter descriptions and parameter settings of the particle swarm algorithm. 185 The iteration is set to 50, 1 and 2 are the cognition and social learning factor respectively, their default values are set as 1, 186 and they can ensure that particles are more affected locally or globally. 1 and 2 are two random numbers in the range [0, 1]. The 187 use of the inertial weight controls the previous history of velocity on the current one. is the weight factor. A considerable inertia 188 weight facilitates global exploration, while a small one tends to facilitate local exploration. A suitable value of the inertia weight 189 usually provides a balance between the global and regional exploration abilities. We used a linearly decreasing inertia weight, 190 which starts at 0.9 and ends at 0.4, the performance of PSO can be significantly improved. The inertial weight can be expressed as 191 follows: 192 were utilized in this study (Fig. 3), point A is station 41025 at 35°1'30" N 75°21'47" W, in the offshore, while point B is station 202 41048 at 31°49'53" N 69°34'23" W, in the deep-sea zone. These stations were selected as they have an unimpaired and long series 203 of recorded significant wave height and metrological data. 204 205 we added two marks for our study locations.

207
There are three sections of used data. The two sites in 2014, 2015, and 2016 are partially significant wave height data, with 1500 208 sample points taken out each year. Data from 2014 and 2015 were used as training data and data from 2016 were used as testing 209 data. Fig. 4 shows the significant wave height (SWH) records of both points.

219
As can be seen from Table 1 and Fig. 4, the average significant wave height at station 41025, located near the coast, is around 220 1.2 m, with the maximum significant wave height around 3 m. The sea state is relatively stable. The average significant wave 221 height at station 41048, located in the deep-sea area, is about 2 m, and the maximum significant wave height is about 6.5 m. The 222 sea conditions are relatively rough. Therefore, it is difficult to predict the significant wave height in the deep-sea area. 223 The significant wave height data for 2014 and 2015 were used as input variables for model development. The relevance of each 224 feature with significant wave height needs to be determined before choosing the input features. The correlation coefficient , can 225 be calculated as 226 where , represents the correlation coefficient between data set x and y, i is a positive integer. | | ≥ 0.8 indicates that there is a 228 high correlation between the two features. The correlation coefficient of the input features with the output feature is shown in Table  229 2. H-i in the table represents the significant wave height data from th hours ago, H-2 represents the significant wave height data 230 from two hours ago, as an example. From the table, it can be seen that the correlation coefficient is lower than 0.8 at H-6, so the 231 data from five hours ago are used as input in this paper. 232 233

Models evaluations 236
To evaluate the performance of the models, statistical and standardized metrics were used. The mathematical formulations of these 237 assessment metrics are given as follows. 238 1) Root mean square error (RMSE) is expressed as follow: 239

Single models 252
We first consider the single model to predict the significant wave height. The single model used here is LSSVM, ELM, and ANN, 253 and the significant wave height is predicted for 1 hour and 3 hours. The specific parameters of various model networks are shown 254 in Table 3.

IN is the number of input layer units, H is the number of hidden layer units, O is the number of output layer units, is 255
the confidence, is the penalty coefficient. 256  Fig. 5 and Fig. 6 show the predictions of the significant wave height of 41025 and 41048 stations by three single models. Table  259 4 shows the numerical analysis of specific evaluation indicators. It can be seen from Table 3 that for the wave height prediction  260 of 41025 stations near the coast, 2 can be kept above 0.8 when the 3-hour prediction is made. For the 41048 station in the deep-261 sea area, 2 can be maintained above 0.9 during the 3-hour prediction, and there is a high correlation between the predicted 262 significant wave height and the observed significant wave height. In general, the three algorithms have achieved satisfactory results 263 in predicting the significant wave height, but LSSVM has higher prediction accuracy than the other two models. This clearly shows 264 that compared with other models, the proposed LSSVM model can be considered as the best wave height predictor. 265 It can be seen from Fig. 5 and Fig. 6 that the observed wave height and the predicted wave height are slightly misaligned on the 266 time scale axis. It can be seen from the enlarged view in the figure that a one-time step significantly shifts the predicted wave 267 heights of the three single models. These wave forecasting models exhibit lag in the prediction timing, making the univariate time 268 series forecasting a futile attempt. As the leading time increases, these lags become larger. The lag is a type of prediction error that 269 can also be found in other work on wave forecasting using single models. The lag mainly results from the nonstationarity hidden

Hybrid models 296
The time series of ocean waves is a complicated nonlinear and nonstationary signal that consists of different oscillation scales. The 297 time series prediction of non-stationary data by using single models only will lead to the homogenization of the original input 298 data's various characteristics, which could affect the prediction accuracy and cause the lag phenomenon. Accordingly, the non-299 stationarity of the time series of significant wave height and input variables should be reduced. The combination of an EMD model 300 with an LSSVM model provides an effective way to improve the wave prediction. The EMD was adopted to decompose a 301 significant wave height series that consisted of one residual series and several IMFs. Then, the residual series and IMFs were 302 modeled by the LSSVM model, and finally the summation of the prediction output of subseries significant wave height. In addition, 303 the PSO was employed to optimize the LSSVM parameters to increase the prediction accuracy. 304 In the first step, the wave height time series is decomposed into a couple of meaningful and straightforward IMFs and a residual 305 by EMD (Fig. 7). Significant wave data sets are decomposed into IMFs and residuals when implementing the EMD-based prediction models. Fig.  312 7 displays the decomposition results of wave height time series measured at Station 41025, the EMD decomposition decomposes 313 the nonlinear significant wave height into 7 IMFs and 1 res, where it is seen that several simple components can represent the 314 complex wave height time series. This would have enabled the single model to extract features during the modeling of significant 315 wave height effectively. Next, an EMD-based hybrid model will be used to predict the significant wave height. 316 As can be seen in Fig. 8, the single model LSSVM shows a prediction timing lag (red dotted line).The other two models have 317 overcome the lag by using the EMD technique, prediction results for the nonlinear and nonstationary waves were improved mainly 318 by combining the EMD technique with the single model. 319 It appears that the use of the EMD decomposition hybrid model solves the lag phenomenon and improves the prediction accuracy. 320 As shown in Table 4 It can be seen from Fig. 8 that the preprocessing method of EMD decomposition has solved the lag phenomenon. However, the 334 significant wave height prediction effect using the LSSVM model is still not very satisfactory. For example, there are errors in 335 predicting the peaks and troughs of the significant wave height. The next step is to optimize LSSVM parameters to improve the 336 prediction accuracy of the model. 337 Changing the parameter values of a prediction system can have a significant impact on its performance. Therefore, we should 338 find the optimum parameter values for the prediction system , human experts have performed this task, who typically use a priori 339 knowledge to specify the parameter values. However, this approach can be subject to human bias. PSO has emerged as a practical 340 tool for high-quality parameter selection in prediction systems. 341 PSO is used to optimize the LSSVM parameters. The methodological steps can be found in the description of the method in 342 section 2. Fig. 8 shows that the EMD-PSO-LSSVM model can predict the significant wave height peaks and troughs very well, 343 significantly improving the prediction accuracy. 344 Table 4 presents the results obtained of the significant wave height prediction by the EMD-PSO-LSSVM method. As can be 345 seen, the prediction of significant wave height is accurate with the proposed method, and the effect of using PSO to optimize the 346 LSSVM parameters can be seen with an improvement of the prediction accuracy from It can be seen from Table 4 that the prediction effect at station 41048 is better than that at station 41025. One of the reasons for 353 https://doi.org/10.5194/os-2021-2 Preprint. Discussion started: 28 January 2021 c Author(s) 2021. CC BY 4.0 License. this is that 41025 is a station near the coast. The significant wave height is relatively stable, and there are few big winds. In extreme 354 cases, the training data is less. It can be seen from Fig. 9 and Fig. 10 that the significant wave height is relatively stable. When it 355 is higher, the fitted data points are more scattered. 41048 is in the deep-sea area, the significant wave height range is relatively 356 large, and the training data is also relatively large. It can be seen from Fig. 11 and Fig. 12 that when the significant wave height is 357 high, the fitting effect of the data points is still better. 358 359  is more than 0.902 (see Table 4), while the best-fit line slopes for the scatters are better than 0.8973 ( Fig. 9 and Fig. 11). 387 Correspondingly, the RMSE, MAE, and MSE predicted by EMD-PSO-LSSVM at the two sites are also the lowest. 388

Conclusions 389
This paper introduces a new prediction method, using EMD-PSO-LSSVM for nonlinear and nonstationary significant wave height 390 prediction. It has high adaptability and accuracy for dealing with any random time series wave prediction problem. 391 We have carried out some actual forecasting operations on the significant wave heights in the offshore and deep-sea areas of the 392 North Atlantic Ocean, using single models and hybrid models to make the prediction. Several statistical indices were utilized for 393 evaluating the accuracy of the predictions of the proposed models. From the obtained results, due to the nonlinearity and 394 nonstationarity of the significant wave height, the traditional single models have the phenomenon of lagging prediction; and as the 395 leading time increases, the lag in the prediction becomes more and more serious. This lagging phenomenon reduces the prediction 396 accuracy, and of course, it will impact the actual engineering applications. Therefore, the EMD method is added to preprocess the 397 significant wave height based on the time series, and the EMD-LSSVM hybrid model with preprocessing can well solve the 398 problem of prediction lag. However, the predicted results are not very satisfactory. For example, the prediction of the peaks and 399 troughs of the significant wave height is not accurate, which reduces the prediction accuracy. Therefore, the PSO algorithm is 400 added to the original EMD-LSSVM hybrid method, and the critical parameters of LSSVM are optimized through the PSO 401 algorithm. In this way, a new hybrid model EMD-PSO-LSSVM is proposed. Significant wave height data from two NDBC buoys 402 with various geographical and statistical properties were used in the comparison studies. Various data from Table 4