5.3. Extreme Value Analysis 1 (Solution)#
Sampling extremes#
Objective#
Apply sampling techniques to identify extremes within an historical database.
Introduction#
The first step to model the extreme observations of our loading variable is to extract those from our database. In this notebook, we will use the wave height dataset seen in the videos to put sampling techniques into practise. We will follow the next steps:
Load libraries and data
Sampling extremes with Block Maxima
Sampling extremes with Peak Over Threshold
Let’s talk about threshold and declustering time selection
0. Load libraries and data#
Install dependences if needed
!pip install pandas numpy scipy statsmodels matplotlib datetime wget
Import necessary libraries (include more if needed).
import pandas as pd
import numpy as np
from scipy import stats
from scipy.signal import find_peaks
import os
import matplotlib.pyplot as plt
import matplotlib
import datetime
import wget
Load the database obtained from a buoy from surfdrive.
data_name = wget.download('https://surfdrive.surf.nl/files/index.php/s/2Z1OHYg5Gv9PIAK/download')
data = pd.read_csv(data_name, sep = ';')
data.head()
0% [ ] 0 / 2462003
0% [ ] 8192 / 2462003
0% [ ] 16384 / 2462003
0% [ ] 24576 / 2462003
1% [ ] 32768 / 2462003
1% [ ] 40960 / 2462003
1% [. ] 49152 / 2462003
2% [. ] 57344 / 2462003
2% [. ] 65536 / 2462003
2% [. ] 73728 / 2462003
3% [. ] 81920 / 2462003
3% [. ] 90112 / 2462003
3% [.. ] 98304 / 2462003
4% [.. ] 106496 / 2462003
4% [.. ] 114688 / 2462003
4% [.. ] 122880 / 2462003
5% [.. ] 131072 / 2462003
5% [... ] 139264 / 2462003
5% [... ] 147456 / 2462003
6% [... ] 155648 / 2462003
6% [... ] 163840 / 2462003
6% [... ] 172032 / 2462003
7% [... ] 180224 / 2462003
7% [.... ] 188416 / 2462003
7% [.... ] 196608 / 2462003
8% [.... ] 204800 / 2462003
8% [.... ] 212992 / 2462003
8% [.... ] 221184 / 2462003
9% [..... ] 229376 / 2462003
9% [..... ] 237568 / 2462003
9% [..... ] 245760 / 2462003
10% [..... ] 253952 / 2462003
10% [..... ] 262144 / 2462003
10% [..... ] 270336 / 2462003
11% [...... ] 278528 / 2462003
11% [...... ] 286720 / 2462003
11% [...... ] 294912 / 2462003
12% [...... ] 303104 / 2462003
12% [...... ] 311296 / 2462003
12% [....... ] 319488 / 2462003
13% [....... ] 327680 / 2462003
13% [....... ] 335872 / 2462003
13% [....... ] 344064 / 2462003
14% [....... ] 352256 / 2462003
14% [....... ] 360448 / 2462003
14% [........ ] 368640 / 2462003
15% [........ ] 376832 / 2462003
15% [........ ] 385024 / 2462003
15% [........ ] 393216 / 2462003
16% [........ ] 401408 / 2462003
16% [........ ] 409600 / 2462003
16% [......... ] 417792 / 2462003
17% [......... ] 425984 / 2462003
17% [......... ] 434176 / 2462003
17% [......... ] 442368 / 2462003
18% [......... ] 450560 / 2462003
18% [.......... ] 458752 / 2462003
18% [.......... ] 466944 / 2462003
19% [.......... ] 475136 / 2462003
19% [.......... ] 483328 / 2462003
19% [.......... ] 491520 / 2462003
20% [.......... ] 499712 / 2462003
20% [........... ] 507904 / 2462003
20% [........... ] 516096 / 2462003
21% [........... ] 524288 / 2462003
21% [........... ] 532480 / 2462003
21% [........... ] 540672 / 2462003
22% [............ ] 548864 / 2462003
22% [............ ] 557056 / 2462003
22% [............ ] 565248 / 2462003
23% [............ ] 573440 / 2462003
23% [............ ] 581632 / 2462003
23% [............ ] 589824 / 2462003
24% [............. ] 598016 / 2462003
24% [............. ] 606208 / 2462003
24% [............. ] 614400 / 2462003
25% [............. ] 622592 / 2462003
25% [............. ] 630784 / 2462003
25% [.............. ] 638976 / 2462003
26% [.............. ] 647168 / 2462003
26% [.............. ] 655360 / 2462003
26% [.............. ] 663552 / 2462003
27% [.............. ] 671744 / 2462003
27% [.............. ] 679936 / 2462003
27% [............... ] 688128 / 2462003
28% [............... ] 696320 / 2462003
28% [............... ] 704512 / 2462003
28% [............... ] 712704 / 2462003
29% [............... ] 720896 / 2462003
29% [............... ] 729088 / 2462003
29% [................ ] 737280 / 2462003
30% [................ ] 745472 / 2462003
30% [................ ] 753664 / 2462003
30% [................ ] 761856 / 2462003
31% [................ ] 770048 / 2462003
31% [................. ] 778240 / 2462003
31% [................. ] 786432 / 2462003
32% [................. ] 794624 / 2462003
32% [................. ] 802816 / 2462003
32% [................. ] 811008 / 2462003
33% [................. ] 819200 / 2462003
33% [.................. ] 827392 / 2462003
33% [.................. ] 835584 / 2462003
34% [.................. ] 843776 / 2462003
34% [.................. ] 851968 / 2462003
34% [.................. ] 860160 / 2462003
35% [................... ] 868352 / 2462003
35% [................... ] 876544 / 2462003
35% [................... ] 884736 / 2462003
36% [................... ] 892928 / 2462003
36% [................... ] 901120 / 2462003
36% [................... ] 909312 / 2462003
37% [.................... ] 917504 / 2462003
37% [.................... ] 925696 / 2462003
37% [.................... ] 933888 / 2462003
38% [.................... ] 942080 / 2462003
38% [.................... ] 950272 / 2462003
38% [..................... ] 958464 / 2462003
39% [..................... ] 966656 / 2462003
39% [..................... ] 974848 / 2462003
39% [..................... ] 983040 / 2462003
40% [..................... ] 991232 / 2462003
40% [..................... ] 999424 / 2462003
40% [...................... ] 1007616 / 2462003
41% [...................... ] 1015808 / 2462003
41% [...................... ] 1024000 / 2462003
41% [...................... ] 1032192 / 2462003
42% [...................... ] 1040384 / 2462003
42% [...................... ] 1048576 / 2462003
42% [....................... ] 1056768 / 2462003
43% [....................... ] 1064960 / 2462003
43% [....................... ] 1073152 / 2462003
43% [....................... ] 1081344 / 2462003
44% [....................... ] 1089536 / 2462003
44% [........................ ] 1097728 / 2462003
44% [........................ ] 1105920 / 2462003
45% [........................ ] 1114112 / 2462003
45% [........................ ] 1122304 / 2462003
45% [........................ ] 1130496 / 2462003
46% [........................ ] 1138688 / 2462003
46% [......................... ] 1146880 / 2462003
46% [......................... ] 1155072 / 2462003
47% [......................... ] 1163264 / 2462003
47% [......................... ] 1171456 / 2462003
47% [......................... ] 1179648 / 2462003
48% [.......................... ] 1187840 / 2462003
48% [.......................... ] 1196032 / 2462003
48% [.......................... ] 1204224 / 2462003
49% [.......................... ] 1212416 / 2462003
49% [.......................... ] 1220608 / 2462003
49% [.......................... ] 1228800 / 2462003
50% [........................... ] 1236992 / 2462003
50% [........................... ] 1245184 / 2462003
50% [........................... ] 1253376 / 2462003
51% [........................... ] 1261568 / 2462003
51% [........................... ] 1269760 / 2462003
51% [............................ ] 1277952 / 2462003
52% [............................ ] 1286144 / 2462003
52% [............................ ] 1294336 / 2462003
52% [............................ ] 1302528 / 2462003
53% [............................ ] 1310720 / 2462003
53% [............................ ] 1318912 / 2462003
53% [............................. ] 1327104 / 2462003
54% [............................. ] 1335296 / 2462003
54% [............................. ] 1343488 / 2462003
54% [............................. ] 1351680 / 2462003
55% [............................. ] 1359872 / 2462003
55% [.............................. ] 1368064 / 2462003
55% [.............................. ] 1376256 / 2462003
56% [.............................. ] 1384448 / 2462003
56% [.............................. ] 1392640 / 2462003
56% [.............................. ] 1400832 / 2462003
57% [.............................. ] 1409024 / 2462003
57% [............................... ] 1417216 / 2462003
57% [............................... ] 1425408 / 2462003
58% [............................... ] 1433600 / 2462003
58% [............................... ] 1441792 / 2462003
58% [............................... ] 1449984 / 2462003
59% [............................... ] 1458176 / 2462003
59% [................................ ] 1466368 / 2462003
59% [................................ ] 1474560 / 2462003
60% [................................ ] 1482752 / 2462003
60% [................................ ] 1490944 / 2462003
60% [................................ ] 1499136 / 2462003
61% [................................. ] 1507328 / 2462003
61% [................................. ] 1515520 / 2462003
61% [................................. ] 1523712 / 2462003
62% [................................. ] 1531904 / 2462003
62% [................................. ] 1540096 / 2462003
62% [................................. ] 1548288 / 2462003
63% [.................................. ] 1556480 / 2462003
63% [.................................. ] 1564672 / 2462003
63% [.................................. ] 1572864 / 2462003
64% [.................................. ] 1581056 / 2462003
64% [.................................. ] 1589248 / 2462003
64% [................................... ] 1597440 / 2462003
65% [................................... ] 1605632 / 2462003
65% [................................... ] 1613824 / 2462003
65% [................................... ] 1622016 / 2462003
66% [................................... ] 1630208 / 2462003
66% [................................... ] 1638400 / 2462003
66% [.................................... ] 1646592 / 2462003
67% [.................................... ] 1654784 / 2462003
67% [.................................... ] 1662976 / 2462003
67% [.................................... ] 1671168 / 2462003
68% [.................................... ] 1679360 / 2462003
68% [..................................... ] 1687552 / 2462003
68% [..................................... ] 1695744 / 2462003
69% [..................................... ] 1703936 / 2462003
69% [..................................... ] 1712128 / 2462003
69% [..................................... ] 1720320 / 2462003
70% [..................................... ] 1728512 / 2462003
70% [...................................... ] 1736704 / 2462003
70% [...................................... ] 1744896 / 2462003
71% [...................................... ] 1753088 / 2462003
71% [...................................... ] 1761280 / 2462003
71% [...................................... ] 1769472 / 2462003
72% [...................................... ] 1777664 / 2462003
72% [....................................... ] 1785856 / 2462003
72% [....................................... ] 1794048 / 2462003
73% [....................................... ] 1802240 / 2462003
73% [....................................... ] 1810432 / 2462003
73% [....................................... ] 1818624 / 2462003
74% [........................................ ] 1826816 / 2462003
74% [........................................ ] 1835008 / 2462003
74% [........................................ ] 1843200 / 2462003
75% [........................................ ] 1851392 / 2462003
75% [........................................ ] 1859584 / 2462003
75% [........................................ ] 1867776 / 2462003
76% [......................................... ] 1875968 / 2462003
76% [......................................... ] 1884160 / 2462003
76% [......................................... ] 1892352 / 2462003
77% [......................................... ] 1900544 / 2462003
77% [......................................... ] 1908736 / 2462003
77% [.......................................... ] 1916928 / 2462003
78% [.......................................... ] 1925120 / 2462003
78% [.......................................... ] 1933312 / 2462003
78% [.......................................... ] 1941504 / 2462003
79% [.......................................... ] 1949696 / 2462003
79% [.......................................... ] 1957888 / 2462003
79% [........................................... ] 1966080 / 2462003
80% [........................................... ] 1974272 / 2462003
80% [........................................... ] 1982464 / 2462003
80% [........................................... ] 1990656 / 2462003
81% [........................................... ] 1998848 / 2462003
81% [............................................ ] 2007040 / 2462003
81% [............................................ ] 2015232 / 2462003
82% [............................................ ] 2023424 / 2462003
82% [............................................ ] 2031616 / 2462003
82% [............................................ ] 2039808 / 2462003
83% [............................................ ] 2048000 / 2462003
83% [............................................. ] 2056192 / 2462003
83% [............................................. ] 2064384 / 2462003
84% [............................................. ] 2072576 / 2462003
84% [............................................. ] 2080768 / 2462003
84% [............................................. ] 2088960 / 2462003
85% [............................................. ] 2097152 / 2462003
85% [.............................................. ] 2105344 / 2462003
85% [.............................................. ] 2113536 / 2462003
86% [.............................................. ] 2121728 / 2462003
86% [.............................................. ] 2129920 / 2462003
86% [.............................................. ] 2138112 / 2462003
87% [............................................... ] 2146304 / 2462003
87% [............................................... ] 2154496 / 2462003
87% [............................................... ] 2162688 / 2462003
88% [............................................... ] 2170880 / 2462003
88% [............................................... ] 2179072 / 2462003
88% [............................................... ] 2187264 / 2462003
89% [................................................ ] 2195456 / 2462003
89% [................................................ ] 2203648 / 2462003
89% [................................................ ] 2211840 / 2462003
90% [................................................ ] 2220032 / 2462003
90% [................................................ ] 2228224 / 2462003
90% [................................................. ] 2236416 / 2462003
91% [................................................. ] 2244608 / 2462003
91% [................................................. ] 2252800 / 2462003
91% [................................................. ] 2260992 / 2462003
92% [................................................. ] 2269184 / 2462003
92% [................................................. ] 2277376 / 2462003
92% [.................................................. ] 2285568 / 2462003
93% [.................................................. ] 2293760 / 2462003
93% [.................................................. ] 2301952 / 2462003
93% [.................................................. ] 2310144 / 2462003
94% [.................................................. ] 2318336 / 2462003
94% [................................................... ] 2326528 / 2462003
94% [................................................... ] 2334720 / 2462003
95% [................................................... ] 2342912 / 2462003
95% [................................................... ] 2351104 / 2462003
95% [................................................... ] 2359296 / 2462003
96% [................................................... ] 2367488 / 2462003
96% [.................................................... ] 2375680 / 2462003
96% [.................................................... ] 2383872 / 2462003
97% [.................................................... ] 2392064 / 2462003
97% [.................................................... ] 2400256 / 2462003
97% [.................................................... ] 2408448 / 2462003
98% [..................................................... ] 2416640 / 2462003
98% [..................................................... ] 2424832 / 2462003
98% [..................................................... ] 2433024 / 2462003
99% [..................................................... ] 2441216 / 2462003
99% [..................................................... ] 2449408 / 2462003
99% [..................................................... ] 2457600 / 2462003
100% [......................................................] 2462003 / 2462003
Date (GMT) | Hs(m) | |
---|---|---|
0 | 1985 09 24 21 | 0.22 |
1 | 1985 09 25 21 | 0.46 |
2 | 1985 09 26 00 | 0.73 |
3 | 1985 09 26 03 | 0.72 |
4 | 1985 09 26 06 | 0.75 |
The first column of the database corresponds to the date and time of each measurement, so we are going to give it the right format. The second column corresponds to the hourly measured significant wave height (\(H_s\)).
data['Date (GMT)'] = pd.to_datetime(data['Date (GMT)'])
data.head()
Date (GMT) | Hs(m) | |
---|---|---|
0 | 1985-09-24 21:00:00 | 0.22 |
1 | 1985-09-25 21:00:00 | 0.46 |
2 | 1985-09-26 00:00:00 | 0.73 |
3 | 1985-09-26 03:00:00 | 0.72 |
4 | 1985-09-26 06:00:00 | 0.75 |
We can start the analysis of the data by plotting the time series.
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
plt.show()
In the figure, you can see some outliers. When the buoy is not able to measure, it records -9,999. Before going ahead with the analysis, we need to remove these values. Afterwards, we can plot the data again.
data = data[data['Hs(m)'] >= 0]
data.set_axis([np.linspace(0, len(data)-1, len(data), dtype = int)], axis = 'index', inplace = True)
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
Now we have a clean database to work! As you can see, the buoy has been measuring from 1985 until 2005, so we have almost 20 years of measurements. However, these data are not enough to totally represent the right tail of the distribution (high return periods), so we need to model it.
1. Sampling extremes with Block Maxima#
This first sampling technique consists of defining a time block and selecting the maximum observation within that time block.
Task #1: identify extreme events using Block Maxima. Define a function which implements Block Maxima with a time block equal to a year. The function must take as input the buoy data. The output of the function must be a dataframe with the two columns of the original data.
Hint: using pd.DatetimeIndex(data[‘Date (GMT)’]).year, you can extract the year of each observation
def yearly_maxima(data):
min_year = min(pd.DatetimeIndex(data['Date (GMT)']).year)
max_year = max(pd.DatetimeIndex(data['Date (GMT)']).year)
max_list = pd.DataFrame(columns = data.columns)
for i in range(min_year, max_year+1):
block = data.loc[pd.DatetimeIndex(data['Date (GMT)']).year == i]
idx_max = block['Hs(m)'].idxmax(axis = 0)
max_list = pd.concat([max_list, block.loc[[idx_max]]])
return max_list
def yearly_maxima(data):
idx_max = data.groupby(pd.DatetimeIndex(data['Date (GMT)']).year)['Hs(m)'].idxmax()
max_list = data.loc[idx_max]
return max_list
Let’s apply your function to our data!
yearly_maxima_list = yearly_maxima(data)
yearly_maxima_list
Date (GMT) | Hs(m) | |
---|---|---|
230 | 1985-10-28 02:00:00 | 3.57 |
2481 | 1986-09-30 18:00:00 | 3.75 |
4724 | 1987-11-03 21:00:00 | 2.64 |
6168 | 1988-10-16 19:00:00 | 2.64 |
7957 | 1989-09-05 17:00:00 | 3.08 |
10237 | 1990-03-03 22:00:00 | 3.22 |
16649 | 1991-01-26 14:00:00 | 2.99 |
22659 | 1992-02-20 02:00:00 | 3.53 |
35113 | 1993-10-27 04:00:00 | 2.99 |
37723 | 1994-02-16 19:00:00 | 2.41 |
46351 | 1995-03-12 13:00:00 | 3.67 |
56141 | 1996-09-12 01:00:00 | 2.97 |
64539 | 1997-10-29 10:00:00 | 4.03 |
73844 | 1998-12-02 19:00:00 | 3.24 |
81860 | 1999-11-12 03:00:00 | 2.90 |
85911 | 2000-05-04 09:00:00 | 1.95 |
96838 | 2001-11-11 03:00:00 | 4.49 |
99392 | 2002-03-28 22:00:00 | 3.63 |
111696 | 2003-10-16 15:00:00 | 3.39 |
115468 | 2004-03-29 12:00:00 | 3.50 |
123477 | 2005-02-28 20:00:00 | 3.00 |
We have obtained one maximum value per year, so 21 extreme values. We can also plot it on our timeseries to see how it looks!
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$ time series')
axs.scatter(yearly_maxima_list['Date (GMT)'], yearly_maxima_list['Hs(m)'], 40, 'r', label = 'Yearly Maxima')
axs.set_title('Time series and block maxima')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc28292520>
2. Sampling extremes with Peak Over Threshold#
Let’s go with the second technique: Peak Over Threshold (POT). This sampling technique involves defining two parameters:
A threshold: value above which an observation is considered extreme.
Declustering time: minimum time distance between two extreme observations.
Based on these parameters, we can select those observations which are above the threshold and present a minimum distance equal to the declustering time between events.
Task #2: identify extreme events using the POT method. Define a function which implements the POT method. The function must take as input (1) the buoy data, (2) the threshold, and (3) the declustering time. The output of the function must be a dataframe with the same columns as the data.
Hint: You can use the function find_peaks from SciPy library.
def pot_method(data, threshold, dtime):
idx_pot, _ = find_peaks(data['Hs(m)'], height = threshold, distance = dtime)
pot_list = data.loc[idx_pot]
return pot_list
Let’s apply your function to your data! Do it with a threshold=2.5m and declustering time=48h.
pot_maxima = pot_method(data, threshold = 2.5, dtime = 48)
pot_maxima
Date (GMT) | Hs(m) | |
---|---|---|
230 | 1985-10-28 02:00:00 | 3.57 |
320 | 1985-11-16 16:00:00 | 3.42 |
593 | 1985-12-30 00:00:00 | 2.94 |
992 | 1986-03-01 03:00:00 | 2.85 |
2481 | 1986-09-30 18:00:00 | 3.75 |
4724 | 1987-11-03 21:00:00 | 2.64 |
6168 | 1988-10-16 19:00:00 | 2.64 |
7140 | 1989-03-18 17:00:00 | 2.69 |
7957 | 1989-09-05 17:00:00 | 3.08 |
8005 | 1989-09-07 21:00:00 | 2.53 |
8516 | 1989-11-15 08:00:00 | 2.62 |
10237 | 1990-03-03 22:00:00 | 3.22 |
11530 | 1990-05-01 15:00:00 | 2.58 |
16584 | 1991-01-23 16:00:00 | 2.85 |
16649 | 1991-01-26 14:00:00 | 2.99 |
16802 | 1991-02-03 10:00:00 | 2.50 |
17107 | 1991-02-19 12:00:00 | 2.83 |
18231 | 1991-04-15 08:00:00 | 2.67 |
22551 | 1992-01-28 16:00:00 | 2.55 |
22659 | 1992-02-20 02:00:00 | 3.53 |
23913 | 1992-05-03 12:00:00 | 2.84 |
24892 | 1992-06-14 19:00:00 | 2.72 |
28858 | 1993-02-02 00:00:00 | 2.98 |
35113 | 1993-10-27 04:00:00 | 2.99 |
46067 | 1995-02-28 15:00:00 | 3.37 |
46351 | 1995-03-12 13:00:00 | 3.67 |
48097 | 1995-06-26 13:00:00 | 2.50 |
50290 | 1995-09-26 00:00:00 | 3.14 |
56141 | 1996-09-12 01:00:00 | 2.97 |
58399 | 1996-12-30 03:00:00 | 2.79 |
60356 | 1997-04-08 21:00:00 | 3.05 |
64539 | 1997-10-29 10:00:00 | 4.03 |
65416 | 1997-12-05 01:00:00 | 3.76 |
73844 | 1998-12-02 19:00:00 | 3.24 |
75218 | 1999-01-31 13:00:00 | 2.63 |
81860 | 1999-11-12 03:00:00 | 2.90 |
91405 | 2001-02-15 17:00:00 | 2.64 |
96838 | 2001-11-11 03:00:00 | 4.49 |
96946 | 2001-11-15 15:00:00 | 3.75 |
97345 | 2001-12-15 22:00:00 | 3.30 |
99392 | 2002-03-28 22:00:00 | 3.63 |
99493 | 2002-04-02 17:00:00 | 2.97 |
100274 | 2002-05-07 16:00:00 | 3.27 |
106125 | 2003-02-16 00:00:00 | 2.77 |
107537 | 2003-04-16 02:00:00 | 2.96 |
111696 | 2003-10-16 15:00:00 | 3.39 |
111745 | 2003-10-18 16:00:00 | 2.67 |
112796 | 2003-12-08 16:00:00 | 2.89 |
114565 | 2004-02-20 21:00:00 | 2.68 |
115468 | 2004-03-29 12:00:00 | 3.50 |
115893 | 2004-04-16 06:00:00 | 3.31 |
120893 | 2004-11-12 14:00:00 | 2.63 |
121425 | 2004-12-04 21:00:00 | 2.52 |
123477 | 2005-02-28 20:00:00 | 3.00 |
Let’s see how much extreme values we got.
len(pot_maxima)
54
That is much more than the 21 extreme observations we got with Yearly Maxima, right?
Let’s compare both methods visually!
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$ time series')
axs.scatter(yearly_maxima_list['Date (GMT)'], yearly_maxima_list['Hs(m)'], 40, 'r', label = 'Yearly Maxima')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)']+0.25, 40, 'b', label = 'POT Maxima')
axs.set_title('Yearly Maxima vs. POT')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc3af6db50>
Note that for the following phases of Extreme Value Analysis, you need the excesses over the threshold when applying POT method, not the value of the peak itself. This is, if I have identified an extreme observation of 3.5m with a threshold of 2m, I will model an excess of 3.5-2=1.5m.
3. Let’s talk about threshold and declustering time selection#
Now, we are going to discuss about the appropriate selection of the parameters of the POT sampling technique.
Let’s sample again our time series using POT but with different parameters. Let’s use now threshold=1.5 and declustering time=12h.
pot_maxima_2 = pot_method(data, threshold = 1.5, dtime = 12)
len(pot_maxima_2)
465
Now, we have obtained much more extreme values. That’s better for our analysis. Let’s compare the selected extremes in a plot.
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$ time series')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)'], 40, 'b', label = 'threshold=2.5 and dtime=48h')
axs.scatter(pot_maxima_2['Date (GMT)'], pot_maxima_2['Hs(m)']+0.25, 40, 'g', label = 'threshold=1.5 and dtime=12h')
axs.set_title('Comparing different parameters for POT')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc3d58e2e0>
Let’s zoom in and analyze one wave storm.
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$ time series')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)'], 40, 'b', label = 'threshold=2.5 and dtime=48h')
axs.scatter(pot_maxima_2['Date (GMT)'], pot_maxima_2['Hs(m)']+0.25, 40, 'g', label = 'threshold=1.5 and dtime=12h')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.set_xlim(datetime.datetime(2002, 3, 22, 0, 0), datetime.datetime(2002, 4, 9, 0, 0))
axs.legend()
<matplotlib.legend.Legend at 0x7fcbf806a130>
In the previous plot we can see that there was a time storm between the 28th March 2002 and 1st April 2022. Using threshold=2.5 and declustering time=48h, one extreme value is sampled from the wave storm. Using threshold=1.5 and declustering time=12h, three values are selected.
Task #3. What do you think is best? Explain your answer.
As explained in the theory, Extreme Value Analysis is based on the hypothesis that extremes are independent and identically distributed (iid conditions). Thus, the threshold and the declustering time need to be selected so this hypothesis is fulfilled. In this example, it means that only one value, the most extreme, per storm event should be selected. If more than one value per storm event is selected, such values will not be independent.
But what is independence?
Two events, namely A and B, are independent if P(A AND B)=P(A)xP(B).
You will go back to this formal definition later in the MUDE. Here, we are going to address it in a qualitative manner, so you gain some intuition into it.
As you can see in the previous plot, there is a series of large observations between 28th March 2002 and 1st April 2002. Thus, it is reasonable to assume that all of it is one wave storm. During a wave storm, if I observe an extreme value, it is more likely that I observe another one afterwards. Also, these extreme values are caused by the same drivers. Therefore, the observations within a wave storm are related to each other and we cannot assume that they are independent anymore. We will be interested in sampling the largest observation within each wave storm, so we will fulfill the independence hypothesis.