⚠ This website is for the 2022-23 academic year and is no longer maintained. ⚠

Extreme Value Analysis

2. Extreme Value Analysis#

Homework 2: start in Week 1, complete at the end of Week 3. The code and analysis steps will be used directly in the workshop during Week 3, so make sure you get through all of it! You are welcome to collaborate with others, but you should write your own code.

In this homework, you will jump into an engineering problem to apply Extreme Value Analysis (EVA) to the design of a structure against wave loading. To do so, we will use wave data in the North sea extracted from an ERA5 database, which is based on reanalysis (a combination of observations and physics-based model results), enabling us to have data for more than 60 years.

Note

This page is generated from an *.ipynb file and is accompanied by a *.txt file, all of which can be downloaded using the link below.

Download file 1 for this exercise.

Download file 2 for this exercise.

First things first, let’s import the packages we will need to address this workshop. Here is our suggestion, but feel free to add any additional packages you may need.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
from scipy.signal import find_peaks
import datetime
%matplotlib inline

The first step in the design is to determine the severity of the load that our structure needs to withstand according to the applicable regulations and recommendations. For this assignment, we will assume our structure needs have a design life of 50 years and a failure probability during the design life of 0.1.

Task 1: Calculate the design return period using: (1) the Binomial approximation, and (2) the Poisson approximation.
Comment on the the results (differences and reasons, if any…) and choose the design return period.

'''You can do your calculations here'''

Now that we know the design return period, let’s start with the analysis of our wave data.

pandas = pd.read_csv('Time_Series_DEN_lon_8_lat_56.5_ERA5.txt', delimiter=r"\s+", 
                     names=['date_&_time', 
                            'significant_wave_height_(m)', 
                            'mean_wave_period_(s)', 
                            'Peak_wave_Period_(s)', 
                            'mean_wave_direction_(deg_N)', 
                            '10_meter_wind_speed_(m/s)', 
                            'Wind_direction_(deg_N)'],  # custom header names
                     skiprows=1)  # Skip the initial row (header)

We will change the format of the time stamp and start taking a look at our data. Ensure you know what it is in each column of the dataframe.

pandas['date_&_time'] = pd.to_datetime(pandas['date_&_time']-719529, unit='D')

# The value 719529 is the datenum value of the Unix epoch start (1970-01-01), 
# which is the default origin for pd.to_datetime().

pandas.head()

Task 2: Plot the time series of wave height.

'''You can write your code here'''

It is always good practice to have a look to our data visually, since it is one of the most simple ways to detect possible outliers or error measurements. For instance, when working with data coming from buoys, an extremely high or low number (such as 9999 or -9999) is recorded when the buoy is not able to measure. That way error measurements can be easily removed prior to our analysis.

Task 3: Calculate the basic descriptive statistics. What can you conclude from them?

'''You can write your code here'''

In the following code, we are selecting the dominant wave direction. To do so, we select the values of the wave height above the 99% percentile (we are interested in the highest values)and select the range of directions corresponding to those observations. The resulting selection is stored in a new pandas dataframe names ‘pandas_angle’. The process is illustrated in the plots.

plt.figure(2, figsize = (15,10), facecolor='white')

plt.subplot(2,3,1)
plt.scatter(pandas['mean_wave_direction_(deg_N)'], pandas['significant_wave_height_(m)'], s = 2)
plt.ylabel('${H_s (m)}$')

print(pandas['significant_wave_height_(m)'].quantile(0.99))
pandas_99 = pandas[pandas['significant_wave_height_(m)']>=pandas['significant_wave_height_(m)'].quantile(0.99)]

plt.subplot(2,3,2)
plt.title('Dominant Wave Direction')
plt.scatter(pandas['mean_wave_direction_(deg_N)'], pandas['significant_wave_height_(m)'], s = 2)
plt.scatter(pandas_99['mean_wave_direction_(deg_N)'], pandas_99['significant_wave_height_(m)'], color='orange', s = 2)
plt.axvline(x = 200, color = 'r', linestyle = 'dashed')
plt.axvline(x = 320, color = 'r', linestyle = 'dashed')
plt.xlabel('$Mean Wave Direction (ºN)$')


pandas_angle = pandas[(pandas['mean_wave_direction_(deg_N)'].between(200, 320))]

plt.subplot(2,3,3)
plt.scatter(pandas_angle['mean_wave_direction_(deg_N)'], pandas_angle['significant_wave_height_(m)'], s = 2)
plt.axvline(x = 200, color = 'r', linestyle = 'dashed')
plt.axvline(x = 320, color = 'r', linestyle = 'dashed')
plt.xlim([0, 350])

In the following code, first, we compute the wave length and wave steepness of each observation. After that, we plot the wave height against the wave period highlighting those values of the wave height above the 99% percentile, and draw the lines of constant wave steepness (quotient between the wave height and the wave length).

#Calculate theoretical wave steepness lines following the wave dispersion relationship.
N = 15
iterations = 20
Depth = 35
T_p = np.linspace(0,N,N+1)
L0 = 9.81*(T_p**2)/(2*np.pi) # Deep water wave length
L = np.zeros((iterations,len(T_p)))
L[0,:] = L0                # Initial guess for wave length = deep water wave length
L[0,0] = 0.1

# Calculate the wave periods using an iterative approach
for iL in np.arange(1,(len(L[:,0]))):
    for jL in np.arange(0,len(T_p)):
        L[iL,jL] =  L0[jL]*np.tanh(2*np.pi*(Depth/(L[iL-1,jL])))

# Compute theoretical significant wave heights for different steepnesses
Hs005 = L[-1,:]*0.05;
Hs002 = L[-1,:]*0.02;
Hs001 = L[-1,:]*0.01;

plt.figure(3, figsize = (5,5), facecolor='white')
plt.scatter(pandas['mean_wave_period_(s)'], pandas['significant_wave_height_(m)'], s = 2)
plt.scatter(pandas_99['mean_wave_period_(s)'], pandas_99['significant_wave_height_(m)'], color='orange', s = 2)
plt.plot(T_p[:-2], Hs005[:-2], linestyle = 'dashed', color = 'black', label = 0.05)
plt.plot(T_p[:-2], Hs002[:-2], linestyle = 'dashed', color = 'black', label = 0.02)
plt.plot(T_p[:-2], Hs001[:-2], linestyle = 'dashed', color = 'black', label = 0.01)
plt.xlabel('${T_{m-1,0} (s)}$')
plt.ylabel('${H_s (m)}$')
plt.title('Wave Steepness Plot')
plt.legend()

Task 4: Based on the results of the two previous analyses, which data should we consider for our EVA? Why?

'''You can write your answer here'''

Use the dataframe ‘pandas_angle’ for the following steps.

Task 5: Use the Block Maxima method to sample the (yearly) extreme observations. Plot the results.

'''You can write your code here'''

Task 6: Fit the sampled extremes to fit a Generalized Extreme Value distribution.

'''You can write your code here'''

Task 7: Assess the goodness of fit of the distribution using a QQplot. Comment on the fit of the distribution.

'''You can write your code here'''

Task 8: Plot the return level plot and determine the value of the significant wave height that you need for design according to your calculated return period. Remember that return level plot presents in the x-axis the values of the variable (wave height, here) and in the y-axis the corresponding values of the return period.

Hint: check the definition of return period, it is very easy!

'''You can write your code here'''

Task 9: Select two out of the four techniques listed below and prepare a function for them: (1) Peak Over Threshold (POT), (2) Dispersion Index plot (DI), (3) Parameter stability plots, and (4) Mean residual life plot (MRL).

'''You can write your first function here'''
'''You can write your second function here'''