⚠ This website is for the 2022-23 academic year and is no longer maintained. ⚠

5.3. Extreme Value Analysis 1 (Solution)#

Sampling extremes#

Objective#

Apply sampling techniques to identify extremes within an historical database.

Introduction#

The first step to model the extreme observations of our loading variable is to extract those from our database. In this notebook, we will use the wave height dataset seen in the videos to put sampling techniques into practise. We will follow the next steps:

  1. Load libraries and data

  2. Sampling extremes with Block Maxima

  3. Sampling extremes with Peak Over Threshold

  4. Let’s talk about threshold and declustering time selection

0. Load libraries and data#

Install dependences if needed

!pip install pandas numpy scipy statsmodels matplotlib datetime wget

Import necessary libraries (include more if needed).

import pandas as pd
import numpy as np
from scipy import stats
from scipy.signal import find_peaks
import os
import matplotlib.pyplot as plt
import matplotlib
import datetime
import wget

Load the database obtained from a buoy from surfdrive.

data_name = wget.download('https://surfdrive.surf.nl/files/index.php/s/2Z1OHYg5Gv9PIAK/download')
data = pd.read_csv(data_name, sep = ';')
data.head()
  0% [                                                      ]       0 / 2462003
  0% [                                                      ]    8192 / 2462003
  0% [                                                      ]   16384 / 2462003
  0% [                                                      ]   24576 / 2462003
  1% [                                                      ]   32768 / 2462003
  1% [                                                      ]   40960 / 2462003
  1% [.                                                     ]   49152 / 2462003
  2% [.                                                     ]   57344 / 2462003
  2% [.                                                     ]   65536 / 2462003
  2% [.                                                     ]   73728 / 2462003
  3% [.                                                     ]   81920 / 2462003
  3% [.                                                     ]   90112 / 2462003
  3% [..                                                    ]   98304 / 2462003
  4% [..                                                    ]  106496 / 2462003
  4% [..                                                    ]  114688 / 2462003
  4% [..                                                    ]  122880 / 2462003
  5% [..                                                    ]  131072 / 2462003
  5% [...                                                   ]  139264 / 2462003
  5% [...                                                   ]  147456 / 2462003
  6% [...                                                   ]  155648 / 2462003
  6% [...                                                   ]  163840 / 2462003
  6% [...                                                   ]  172032 / 2462003
  7% [...                                                   ]  180224 / 2462003
  7% [....                                                  ]  188416 / 2462003
  7% [....                                                  ]  196608 / 2462003
  8% [....                                                  ]  204800 / 2462003
  8% [....                                                  ]  212992 / 2462003
  8% [....                                                  ]  221184 / 2462003
  9% [.....                                                 ]  229376 / 2462003
  9% [.....                                                 ]  237568 / 2462003
  9% [.....                                                 ]  245760 / 2462003
 10% [.....                                                 ]  253952 / 2462003
 10% [.....                                                 ]  262144 / 2462003
 10% [.....                                                 ]  270336 / 2462003
 11% [......                                                ]  278528 / 2462003
 11% [......                                                ]  286720 / 2462003
 11% [......                                                ]  294912 / 2462003
 12% [......                                                ]  303104 / 2462003
 12% [......                                                ]  311296 / 2462003
 12% [.......                                               ]  319488 / 2462003
 13% [.......                                               ]  327680 / 2462003
 13% [.......                                               ]  335872 / 2462003
 13% [.......                                               ]  344064 / 2462003
 14% [.......                                               ]  352256 / 2462003
 14% [.......                                               ]  360448 / 2462003
 14% [........                                              ]  368640 / 2462003
 15% [........                                              ]  376832 / 2462003
 15% [........                                              ]  385024 / 2462003
 15% [........                                              ]  393216 / 2462003
 16% [........                                              ]  401408 / 2462003
 16% [........                                              ]  409600 / 2462003
 16% [.........                                             ]  417792 / 2462003
 17% [.........                                             ]  425984 / 2462003
 17% [.........                                             ]  434176 / 2462003
 17% [.........                                             ]  442368 / 2462003
 18% [.........                                             ]  450560 / 2462003
 18% [..........                                            ]  458752 / 2462003
 18% [..........                                            ]  466944 / 2462003
 19% [..........                                            ]  475136 / 2462003
 19% [..........                                            ]  483328 / 2462003
 19% [..........                                            ]  491520 / 2462003
 20% [..........                                            ]  499712 / 2462003
 20% [...........                                           ]  507904 / 2462003
 20% [...........                                           ]  516096 / 2462003
 21% [...........                                           ]  524288 / 2462003
 21% [...........                                           ]  532480 / 2462003
 21% [...........                                           ]  540672 / 2462003
 22% [............                                          ]  548864 / 2462003
 22% [............                                          ]  557056 / 2462003
 22% [............                                          ]  565248 / 2462003
 23% [............                                          ]  573440 / 2462003
 23% [............                                          ]  581632 / 2462003
 23% [............                                          ]  589824 / 2462003
 24% [.............                                         ]  598016 / 2462003
 24% [.............                                         ]  606208 / 2462003
 24% [.............                                         ]  614400 / 2462003
 25% [.............                                         ]  622592 / 2462003
 25% [.............                                         ]  630784 / 2462003
 25% [..............                                        ]  638976 / 2462003
 26% [..............                                        ]  647168 / 2462003
 26% [..............                                        ]  655360 / 2462003
 26% [..............                                        ]  663552 / 2462003
 27% [..............                                        ]  671744 / 2462003
 27% [..............                                        ]  679936 / 2462003
 27% [...............                                       ]  688128 / 2462003
 28% [...............                                       ]  696320 / 2462003
 28% [...............                                       ]  704512 / 2462003
 28% [...............                                       ]  712704 / 2462003
 29% [...............                                       ]  720896 / 2462003
 29% [...............                                       ]  729088 / 2462003
 29% [................                                      ]  737280 / 2462003
 30% [................                                      ]  745472 / 2462003
 30% [................                                      ]  753664 / 2462003
 30% [................                                      ]  761856 / 2462003
 31% [................                                      ]  770048 / 2462003
 31% [.................                                     ]  778240 / 2462003
 31% [.................                                     ]  786432 / 2462003
 32% [.................                                     ]  794624 / 2462003
 32% [.................                                     ]  802816 / 2462003
 32% [.................                                     ]  811008 / 2462003
 33% [.................                                     ]  819200 / 2462003
 33% [..................                                    ]  827392 / 2462003
 33% [..................                                    ]  835584 / 2462003
 34% [..................                                    ]  843776 / 2462003
 34% [..................                                    ]  851968 / 2462003
 34% [..................                                    ]  860160 / 2462003
 35% [...................                                   ]  868352 / 2462003
 35% [...................                                   ]  876544 / 2462003
 35% [...................                                   ]  884736 / 2462003
 36% [...................                                   ]  892928 / 2462003
 36% [...................                                   ]  901120 / 2462003
 36% [...................                                   ]  909312 / 2462003
 37% [....................                                  ]  917504 / 2462003
 37% [....................                                  ]  925696 / 2462003
 37% [....................                                  ]  933888 / 2462003
 38% [....................                                  ]  942080 / 2462003
 38% [....................                                  ]  950272 / 2462003
 38% [.....................                                 ]  958464 / 2462003
 39% [.....................                                 ]  966656 / 2462003
 39% [.....................                                 ]  974848 / 2462003
 39% [.....................                                 ]  983040 / 2462003
 40% [.....................                                 ]  991232 / 2462003
 40% [.....................                                 ]  999424 / 2462003
 40% [......................                                ] 1007616 / 2462003
 41% [......................                                ] 1015808 / 2462003
 41% [......................                                ] 1024000 / 2462003
 41% [......................                                ] 1032192 / 2462003
 42% [......................                                ] 1040384 / 2462003
 42% [......................                                ] 1048576 / 2462003
 42% [.......................                               ] 1056768 / 2462003
 43% [.......................                               ] 1064960 / 2462003
 43% [.......................                               ] 1073152 / 2462003
 43% [.......................                               ] 1081344 / 2462003
 44% [.......................                               ] 1089536 / 2462003
 44% [........................                              ] 1097728 / 2462003
 44% [........................                              ] 1105920 / 2462003
 45% [........................                              ] 1114112 / 2462003
 45% [........................                              ] 1122304 / 2462003
 45% [........................                              ] 1130496 / 2462003
 46% [........................                              ] 1138688 / 2462003
 46% [.........................                             ] 1146880 / 2462003
 46% [.........................                             ] 1155072 / 2462003
 47% [.........................                             ] 1163264 / 2462003
 47% [.........................                             ] 1171456 / 2462003
 47% [.........................                             ] 1179648 / 2462003
 48% [..........................                            ] 1187840 / 2462003
 48% [..........................                            ] 1196032 / 2462003
 48% [..........................                            ] 1204224 / 2462003
 49% [..........................                            ] 1212416 / 2462003
 49% [..........................                            ] 1220608 / 2462003
 49% [..........................                            ] 1228800 / 2462003
 50% [...........................                           ] 1236992 / 2462003
 50% [...........................                           ] 1245184 / 2462003
 50% [...........................                           ] 1253376 / 2462003
 51% [...........................                           ] 1261568 / 2462003
 51% [...........................                           ] 1269760 / 2462003
 51% [............................                          ] 1277952 / 2462003
 52% [............................                          ] 1286144 / 2462003
 52% [............................                          ] 1294336 / 2462003
 52% [............................                          ] 1302528 / 2462003
 53% [............................                          ] 1310720 / 2462003
 53% [............................                          ] 1318912 / 2462003
 53% [.............................                         ] 1327104 / 2462003
 54% [.............................                         ] 1335296 / 2462003
 54% [.............................                         ] 1343488 / 2462003
 54% [.............................                         ] 1351680 / 2462003
 55% [.............................                         ] 1359872 / 2462003
 55% [..............................                        ] 1368064 / 2462003
 55% [..............................                        ] 1376256 / 2462003
 56% [..............................                        ] 1384448 / 2462003
 56% [..............................                        ] 1392640 / 2462003
 56% [..............................                        ] 1400832 / 2462003
 57% [..............................                        ] 1409024 / 2462003
 57% [...............................                       ] 1417216 / 2462003
 57% [...............................                       ] 1425408 / 2462003
 58% [...............................                       ] 1433600 / 2462003
 58% [...............................                       ] 1441792 / 2462003
 58% [...............................                       ] 1449984 / 2462003
 59% [...............................                       ] 1458176 / 2462003
 59% [................................                      ] 1466368 / 2462003
 59% [................................                      ] 1474560 / 2462003
 60% [................................                      ] 1482752 / 2462003
 60% [................................                      ] 1490944 / 2462003
 60% [................................                      ] 1499136 / 2462003
 61% [.................................                     ] 1507328 / 2462003
 61% [.................................                     ] 1515520 / 2462003
 61% [.................................                     ] 1523712 / 2462003
 62% [.................................                     ] 1531904 / 2462003
 62% [.................................                     ] 1540096 / 2462003
 62% [.................................                     ] 1548288 / 2462003
 63% [..................................                    ] 1556480 / 2462003
 63% [..................................                    ] 1564672 / 2462003
 63% [..................................                    ] 1572864 / 2462003
 64% [..................................                    ] 1581056 / 2462003
 64% [..................................                    ] 1589248 / 2462003
 64% [...................................                   ] 1597440 / 2462003
 65% [...................................                   ] 1605632 / 2462003
 65% [...................................                   ] 1613824 / 2462003
 65% [...................................                   ] 1622016 / 2462003
 66% [...................................                   ] 1630208 / 2462003
 66% [...................................                   ] 1638400 / 2462003
 66% [....................................                  ] 1646592 / 2462003
 67% [....................................                  ] 1654784 / 2462003
 67% [....................................                  ] 1662976 / 2462003
 67% [....................................                  ] 1671168 / 2462003
 68% [....................................                  ] 1679360 / 2462003
 68% [.....................................                 ] 1687552 / 2462003
 68% [.....................................                 ] 1695744 / 2462003
 69% [.....................................                 ] 1703936 / 2462003
 69% [.....................................                 ] 1712128 / 2462003
 69% [.....................................                 ] 1720320 / 2462003
 70% [.....................................                 ] 1728512 / 2462003
 70% [......................................                ] 1736704 / 2462003
 70% [......................................                ] 1744896 / 2462003
 71% [......................................                ] 1753088 / 2462003
 71% [......................................                ] 1761280 / 2462003
 71% [......................................                ] 1769472 / 2462003
 72% [......................................                ] 1777664 / 2462003
 72% [.......................................               ] 1785856 / 2462003
 72% [.......................................               ] 1794048 / 2462003
 73% [.......................................               ] 1802240 / 2462003
 73% [.......................................               ] 1810432 / 2462003
 73% [.......................................               ] 1818624 / 2462003
 74% [........................................              ] 1826816 / 2462003
 74% [........................................              ] 1835008 / 2462003
 74% [........................................              ] 1843200 / 2462003
 75% [........................................              ] 1851392 / 2462003
 75% [........................................              ] 1859584 / 2462003
 75% [........................................              ] 1867776 / 2462003
 76% [.........................................             ] 1875968 / 2462003
 76% [.........................................             ] 1884160 / 2462003
 76% [.........................................             ] 1892352 / 2462003
 77% [.........................................             ] 1900544 / 2462003
 77% [.........................................             ] 1908736 / 2462003
 77% [..........................................            ] 1916928 / 2462003
 78% [..........................................            ] 1925120 / 2462003
 78% [..........................................            ] 1933312 / 2462003
 78% [..........................................            ] 1941504 / 2462003
 79% [..........................................            ] 1949696 / 2462003
 79% [..........................................            ] 1957888 / 2462003
 79% [...........................................           ] 1966080 / 2462003
 80% [...........................................           ] 1974272 / 2462003
 80% [...........................................           ] 1982464 / 2462003
 80% [...........................................           ] 1990656 / 2462003
 81% [...........................................           ] 1998848 / 2462003
 81% [............................................          ] 2007040 / 2462003
 81% [............................................          ] 2015232 / 2462003
 82% [............................................          ] 2023424 / 2462003
 82% [............................................          ] 2031616 / 2462003
 82% [............................................          ] 2039808 / 2462003
 83% [............................................          ] 2048000 / 2462003
 83% [.............................................         ] 2056192 / 2462003
 83% [.............................................         ] 2064384 / 2462003
 84% [.............................................         ] 2072576 / 2462003
 84% [.............................................         ] 2080768 / 2462003
 84% [.............................................         ] 2088960 / 2462003
 85% [.............................................         ] 2097152 / 2462003
 85% [..............................................        ] 2105344 / 2462003
 85% [..............................................        ] 2113536 / 2462003
 86% [..............................................        ] 2121728 / 2462003
 86% [..............................................        ] 2129920 / 2462003
 86% [..............................................        ] 2138112 / 2462003
 87% [...............................................       ] 2146304 / 2462003
 87% [...............................................       ] 2154496 / 2462003
 87% [...............................................       ] 2162688 / 2462003
 88% [...............................................       ] 2170880 / 2462003
 88% [...............................................       ] 2179072 / 2462003
 88% [...............................................       ] 2187264 / 2462003
 89% [................................................      ] 2195456 / 2462003
 89% [................................................      ] 2203648 / 2462003
 89% [................................................      ] 2211840 / 2462003
 90% [................................................      ] 2220032 / 2462003
 90% [................................................      ] 2228224 / 2462003
 90% [.................................................     ] 2236416 / 2462003
 91% [.................................................     ] 2244608 / 2462003
 91% [.................................................     ] 2252800 / 2462003
 91% [.................................................     ] 2260992 / 2462003
 92% [.................................................     ] 2269184 / 2462003
 92% [.................................................     ] 2277376 / 2462003
 92% [..................................................    ] 2285568 / 2462003
 93% [..................................................    ] 2293760 / 2462003
 93% [..................................................    ] 2301952 / 2462003
 93% [..................................................    ] 2310144 / 2462003
 94% [..................................................    ] 2318336 / 2462003
 94% [...................................................   ] 2326528 / 2462003
 94% [...................................................   ] 2334720 / 2462003
 95% [...................................................   ] 2342912 / 2462003
 95% [...................................................   ] 2351104 / 2462003
 95% [...................................................   ] 2359296 / 2462003
 96% [...................................................   ] 2367488 / 2462003
 96% [....................................................  ] 2375680 / 2462003
 96% [....................................................  ] 2383872 / 2462003
 97% [....................................................  ] 2392064 / 2462003
 97% [....................................................  ] 2400256 / 2462003
 97% [....................................................  ] 2408448 / 2462003
 98% [..................................................... ] 2416640 / 2462003
 98% [..................................................... ] 2424832 / 2462003
 98% [..................................................... ] 2433024 / 2462003
 99% [..................................................... ] 2441216 / 2462003
 99% [..................................................... ] 2449408 / 2462003
 99% [..................................................... ] 2457600 / 2462003
100% [......................................................] 2462003 / 2462003
Date (GMT) Hs(m)
0 1985 09 24 21 0.22
1 1985 09 25 21 0.46
2 1985 09 26 00 0.73
3 1985 09 26 03 0.72
4 1985 09 26 06 0.75

The first column of the database corresponds to the date and time of each measurement, so we are going to give it the right format. The second column corresponds to the hourly measured significant wave height (\(H_s\)).

data['Date (GMT)'] = pd.to_datetime(data['Date (GMT)'])
data.head()
Date (GMT) Hs(m)
0 1985-09-24 21:00:00 0.22
1 1985-09-25 21:00:00 0.46
2 1985-09-26 00:00:00 0.73
3 1985-09-26 03:00:00 0.72
4 1985-09-26 06:00:00 0.75

We can start the analysis of the data by plotting the time series.

fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
plt.show()
../../_images/dc3f9de12aa3b89db6c364e325b18b0d068bfb8b75acc071992d3dfc20f8f122.png

In the figure, you can see some outliers. When the buoy is not able to measure, it records -9,999. Before going ahead with the analysis, we need to remove these values. Afterwards, we can plot the data again.

data = data[data['Hs(m)'] >= 0]
data.set_axis([np.linspace(0, len(data)-1, len(data), dtype = int)], axis = 'index', inplace = True)
fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
../../_images/1555f4d06de14db2324b0c48fc0f52a7a793973caa68d52d3248abde702b5555.png

Now we have a clean database to work! As you can see, the buoy has been measuring from 1985 until 2005, so we have almost 20 years of measurements. However, these data are not enough to totally represent the right tail of the distribution (high return periods), so we need to model it.

1. Sampling extremes with Block Maxima#

This first sampling technique consists of defining a time block and selecting the maximum observation within that time block.

Task #1: identify extreme events using Block Maxima. Define a function which implements Block Maxima with a time block equal to a year. The function must take as input the buoy data. The output of the function must be a dataframe with the two columns of the original data.

Hint: using pd.DatetimeIndex(data[‘Date (GMT)’]).year, you can extract the year of each observation

def yearly_maxima(data):
    min_year = min(pd.DatetimeIndex(data['Date (GMT)']).year)
    max_year = max(pd.DatetimeIndex(data['Date (GMT)']).year)
    max_list = pd.DataFrame(columns = data.columns)
    for i in range(min_year, max_year+1):
        block = data.loc[pd.DatetimeIndex(data['Date (GMT)']).year == i]
        idx_max = block['Hs(m)'].idxmax(axis = 0)
        max_list = pd.concat([max_list, block.loc[[idx_max]]])
    return max_list
def yearly_maxima(data):
    idx_max = data.groupby(pd.DatetimeIndex(data['Date (GMT)']).year)['Hs(m)'].idxmax()
    max_list = data.loc[idx_max]
    return max_list

Let’s apply your function to our data!

yearly_maxima_list = yearly_maxima(data)
yearly_maxima_list
Date (GMT) Hs(m)
230 1985-10-28 02:00:00 3.57
2481 1986-09-30 18:00:00 3.75
4724 1987-11-03 21:00:00 2.64
6168 1988-10-16 19:00:00 2.64
7957 1989-09-05 17:00:00 3.08
10237 1990-03-03 22:00:00 3.22
16649 1991-01-26 14:00:00 2.99
22659 1992-02-20 02:00:00 3.53
35113 1993-10-27 04:00:00 2.99
37723 1994-02-16 19:00:00 2.41
46351 1995-03-12 13:00:00 3.67
56141 1996-09-12 01:00:00 2.97
64539 1997-10-29 10:00:00 4.03
73844 1998-12-02 19:00:00 3.24
81860 1999-11-12 03:00:00 2.90
85911 2000-05-04 09:00:00 1.95
96838 2001-11-11 03:00:00 4.49
99392 2002-03-28 22:00:00 3.63
111696 2003-10-16 15:00:00 3.39
115468 2004-03-29 12:00:00 3.50
123477 2005-02-28 20:00:00 3.00

We have obtained one maximum value per year, so 21 extreme values. We can also plot it on our timeseries to see how it looks!

fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$ time series')
axs.scatter(yearly_maxima_list['Date (GMT)'], yearly_maxima_list['Hs(m)'], 40, 'r', label = 'Yearly Maxima')
axs.set_title('Time series and block maxima')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc28292520>
../../_images/752813c10fb3f0335b8d2d94b19799646bdd869b2476f54d9a3f1a6935ca8f62.png

2. Sampling extremes with Peak Over Threshold#

Let’s go with the second technique: Peak Over Threshold (POT). This sampling technique involves defining two parameters:

  • A threshold: value above which an observation is considered extreme.

  • Declustering time: minimum time distance between two extreme observations.

Based on these parameters, we can select those observations which are above the threshold and present a minimum distance equal to the declustering time between events.

Task #2: identify extreme events using the POT method. Define a function which implements the POT method. The function must take as input (1) the buoy data, (2) the threshold, and (3) the declustering time. The output of the function must be a dataframe with the same columns as the data.

Hint: You can use the function find_peaks from SciPy library.

def pot_method(data, threshold, dtime):
    idx_pot, _ = find_peaks(data['Hs(m)'], height = threshold, distance = dtime)
    pot_list = data.loc[idx_pot]
    return pot_list

Let’s apply your function to your data! Do it with a threshold=2.5m and declustering time=48h.

pot_maxima = pot_method(data, threshold = 2.5, dtime = 48)
pot_maxima
Date (GMT) Hs(m)
230 1985-10-28 02:00:00 3.57
320 1985-11-16 16:00:00 3.42
593 1985-12-30 00:00:00 2.94
992 1986-03-01 03:00:00 2.85
2481 1986-09-30 18:00:00 3.75
4724 1987-11-03 21:00:00 2.64
6168 1988-10-16 19:00:00 2.64
7140 1989-03-18 17:00:00 2.69
7957 1989-09-05 17:00:00 3.08
8005 1989-09-07 21:00:00 2.53
8516 1989-11-15 08:00:00 2.62
10237 1990-03-03 22:00:00 3.22
11530 1990-05-01 15:00:00 2.58
16584 1991-01-23 16:00:00 2.85
16649 1991-01-26 14:00:00 2.99
16802 1991-02-03 10:00:00 2.50
17107 1991-02-19 12:00:00 2.83
18231 1991-04-15 08:00:00 2.67
22551 1992-01-28 16:00:00 2.55
22659 1992-02-20 02:00:00 3.53
23913 1992-05-03 12:00:00 2.84
24892 1992-06-14 19:00:00 2.72
28858 1993-02-02 00:00:00 2.98
35113 1993-10-27 04:00:00 2.99
46067 1995-02-28 15:00:00 3.37
46351 1995-03-12 13:00:00 3.67
48097 1995-06-26 13:00:00 2.50
50290 1995-09-26 00:00:00 3.14
56141 1996-09-12 01:00:00 2.97
58399 1996-12-30 03:00:00 2.79
60356 1997-04-08 21:00:00 3.05
64539 1997-10-29 10:00:00 4.03
65416 1997-12-05 01:00:00 3.76
73844 1998-12-02 19:00:00 3.24
75218 1999-01-31 13:00:00 2.63
81860 1999-11-12 03:00:00 2.90
91405 2001-02-15 17:00:00 2.64
96838 2001-11-11 03:00:00 4.49
96946 2001-11-15 15:00:00 3.75
97345 2001-12-15 22:00:00 3.30
99392 2002-03-28 22:00:00 3.63
99493 2002-04-02 17:00:00 2.97
100274 2002-05-07 16:00:00 3.27
106125 2003-02-16 00:00:00 2.77
107537 2003-04-16 02:00:00 2.96
111696 2003-10-16 15:00:00 3.39
111745 2003-10-18 16:00:00 2.67
112796 2003-12-08 16:00:00 2.89
114565 2004-02-20 21:00:00 2.68
115468 2004-03-29 12:00:00 3.50
115893 2004-04-16 06:00:00 3.31
120893 2004-11-12 14:00:00 2.63
121425 2004-12-04 21:00:00 2.52
123477 2005-02-28 20:00:00 3.00

Let’s see how much extreme values we got.

len(pot_maxima)
54

That is much more than the 21 extreme observations we got with Yearly Maxima, right?

Let’s compare both methods visually!

fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$ time series')
axs.scatter(yearly_maxima_list['Date (GMT)'], yearly_maxima_list['Hs(m)'], 40, 'r', label = 'Yearly Maxima')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)']+0.25, 40, 'b', label = 'POT Maxima')
axs.set_title('Yearly Maxima vs. POT')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc3af6db50>
../../_images/764f3b42166d6c4f9c6eff1dfd70b0d587739d693154391ac5625227d3a5705e.png

Note that for the following phases of Extreme Value Analysis, you need the excesses over the threshold when applying POT method, not the value of the peak itself. This is, if I have identified an extreme observation of 3.5m with a threshold of 2m, I will model an excess of 3.5-2=1.5m.

3. Let’s talk about threshold and declustering time selection#

Now, we are going to discuss about the appropriate selection of the parameters of the POT sampling technique.

Let’s sample again our time series using POT but with different parameters. Let’s use now threshold=1.5 and declustering time=12h.

pot_maxima_2 = pot_method(data, threshold = 1.5, dtime = 12)
len(pot_maxima_2)
465

Now, we have obtained much more extreme values. That’s better for our analysis. Let’s compare the selected extremes in a plot.

fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label = '${H_s(m)}$ time series')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)'], 40, 'b', label = 'threshold=2.5 and dtime=48h')
axs.scatter(pot_maxima_2['Date (GMT)'], pot_maxima_2['Hs(m)']+0.25, 40, 'g', label = 'threshold=1.5 and dtime=12h')
axs.set_title('Comparing different parameters for POT')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.legend()
<matplotlib.legend.Legend at 0x7fcc3d58e2e0>
../../_images/d5a1fdd204690b963cfaea51fddad9eb0cc5a29bf775889a13bf3b86874d990f.png

Let’s zoom in and analyze one wave storm.

fig, axs = plt.subplots(1, 1)
axs.plot(data['Date (GMT)'], data['Hs(m)'], 'k', label='${H_s(m)}$ time series')
axs.scatter(pot_maxima['Date (GMT)'], pot_maxima['Hs(m)'], 40, 'b', label = 'threshold=2.5 and dtime=48h')
axs.scatter(pot_maxima_2['Date (GMT)'], pot_maxima_2['Hs(m)']+0.25, 40, 'g', label = 'threshold=1.5 and dtime=12h')
axs.set_title('Significant wave height time series')
axs.set_xlabel('Date')
axs.set_ylabel('${H_s (m)}$')
fig.set_size_inches(15, 4)
axs.grid()
axs.set_xlim(datetime.datetime(2002, 3, 22, 0, 0), datetime.datetime(2002, 4, 9, 0, 0))
axs.legend()
<matplotlib.legend.Legend at 0x7fcbf806a130>
../../_images/673214168c8f3bb141118784ae6c69fb26dc0e1572666918a5dc6b5aea967e3b.png

In the previous plot we can see that there was a time storm between the 28th March 2002 and 1st April 2022. Using threshold=2.5 and declustering time=48h, one extreme value is sampled from the wave storm. Using threshold=1.5 and declustering time=12h, three values are selected.

Task #3. What do you think is best? Explain your answer.

As explained in the theory, Extreme Value Analysis is based on the hypothesis that extremes are independent and identically distributed (iid conditions). Thus, the threshold and the declustering time need to be selected so this hypothesis is fulfilled. In this example, it means that only one value, the most extreme, per storm event should be selected. If more than one value per storm event is selected, such values will not be independent.

But what is independence?

Two events, namely A and B, are independent if P(A AND B)=P(A)xP(B).

You will go back to this formal definition later in the MUDE. Here, we are going to address it in a qualitative manner, so you gain some intuition into it.

As you can see in the previous plot, there is a series of large observations between 28th March 2002 and 1st April 2002. Thus, it is reasonable to assume that all of it is one wave storm. During a wave storm, if I observe an extreme value, it is more likely that I observe another one afterwards. Also, these extreme values are caused by the same drivers. Therefore, the observations within a wave storm are related to each other and we cannot assume that they are independent anymore. We will be interested in sampling the largest observation within each wave storm, so we will fulfill the independence hypothesis.