{ "cells": [ { "cell_type": "markdown", "id": "7fa5cb12", "metadata": {}, "source": [ "# Extreme Value Analysis" ] }, { "cell_type": "markdown", "id": "eaa08a9d", "metadata": {}, "source": [ "
\n", " \n", "Solution\n", " \n", "This is the full solution, which you should use for studying. There was also a partial solution that was shared during the workshop which did not include the answers to the interpretation questions.\n", "
\n", "\n", " \n", "This workshop uses the same file as in HW2: `Time_Series_DEN_lon_8_lat_56.5_ERA5.txt`\n", "
\n", "\n", " | date_&_time | \n", "significant_wave_height_(m) | \n", "mean_wave_period_(s) | \n", "Peak_wave_Period_(s) | \n", "mean_wave_direction_(deg_N) | \n", "10_meter_wind_speed_(m/s) | \n", "Wind_direction_(deg_N) | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1950-01-01 00:00:00.000000000 | \n", "1.274487 | \n", "4.493986 | \n", "5.177955 | \n", "199.731575 | \n", "8.582743 | \n", "211.166241 | \n", "
1 | \n", "1950-01-01 04:00:00.000026880 | \n", "1.338850 | \n", "4.609748 | \n", "5.255064 | \n", "214.679306 | \n", "8.867638 | \n", "226.280409 | \n", "
2 | \n", "1950-01-01 07:59:59.999973120 | \n", "1.407454 | \n", "4.775651 | \n", "5.390620 | \n", "225.182820 | \n", "9.423382 | \n", "230.283209 | \n", "
3 | \n", "1950-01-01 12:00:00.000000000 | \n", "1.387721 | \n", "4.800286 | \n", "5.451532 | \n", "227.100041 | \n", "9.037646 | \n", "238.879880 | \n", "
4 | \n", "1950-01-01 16:00:00.000026880 | \n", "1.660848 | \n", "5.112471 | \n", "5.772289 | \n", "244.821975 | \n", "10.187995 | \n", "242.554054 | \n", "
\n", " \n", "Task 1:\n", "Apply POT to sample the extreme observations (you should have already done the function as homework!). Plot the results.\n", " \n", "Use a threshold of 5 meters and a declustering time of 72h.\n", "
\n", "\n", "Here the solutions are provided using PyExtremes package. You are not required to know who to use it, but you should be able to interpret the results of these analysis.\n", "
\n", "\n",
" \n",
"Task 2:\n",
"Fit the sampled extremes to fit a Generalized Pareto distribution. Print the shape parameter. What type of GPD are you obtaining?
\n",
"Hint: what kind of tail is implied by the parameter value?\n",
"
\n", "The obtained parameter is close to 0, so the obtained GPD will be close to an Exponential.\n", "
\n", " \n", "Task 3:\n", "Assess the goodness of fit of the distribution using a QQplot. Comment about the results of the fitting and compare it to those obtained using BM and GEV. Which one would you choose?\n", "
\n", "\n", "QQplot compares the measured and predicted quantiles given by our fit. Therefore, the perfect fit would be the 45-degrees line. In the plot, we can see that the fit is actually very close to that line even for high values of the variable, suggesting that our model is properly modelling the tails.\n", "\n", "If we compare it with the fit provided by BM + GEV, we can see that this one is slightly better, since the points fluctuate a bit less around the 45-degrees line.\n", "
\n", " \n", "Task 4:\n", "Plot the return level plot and determine the value of the significant wave height that you need for design according to your calculated return period. Remember that return level plot presents in the x-axis the values of the variable (wave height, here) and in the y-axis the corresponding values of the return period. \n", "\n", "Compare it to the results obtained using BM + GEV.\n", "
\n", "\n", "The obtained design value with BM + GEV was 9.74m.\n", "The obtained design value with POT + GPD is 10.37m.\n", "In this case, POT+GPD is a bit more more conservative.\n", "However, this conclusion is case specific and it also depends on the selected threshold and declustering time for the POT.\n", "
\n", " \n", "Task 5:\n", "Apply two methods to justify why a threshold=5m and a declustering time=72h are reasonable or not. Write your conclusions.\n", "
\n", "\n", "Threshold should be selected so the parameters of the GPD remain stable. In this case, thresholds up to 5.5m seem reasonable.\n", "You can perform this analysis with several values of the declustering time.
\n", "\n", "Threshold should be selected so the mean excesses follow a linear trend. In this case, thresholds up to 6m seem reasonable.\n", "You can perform this analysis with several values of the declustering time.
\n", "\n", "Threshold and declustering time should be selected so the Dispersion Index is approximately 1 (within the confidence band) to ensure that the number of excesses per year follows a Poisson distribution. Therefore, thresholds between 5.5 and 6m would be reasonable.
\n", "\n", "Note that there will be some differences between your fitting and that provided by pyExtremes. You have probably defined the declustering time as the time between two extremes (two peaks). PyExtremes defines the declustering time as that between the crossing point over the threshold. The figure below illustrates the diffence.\n", "
\n", "\n", "Based on the above, would you expect more or less sampled extremes using pyExtremes? How would it affect to the calculated lambda?
\n", "\n", "We hope this workshop gave you a chance to practice EVA more so that you are able to easily use the code you have already created in your HOS assignments. In addition, we hope that you have more insight into the process of determining whether or not your chosen probability distribution is a good fit, and some tools to justify it quantitatively.
\n", " \n", "For your HOS assignments and exams you are expected to be able to describe and explain the EVA process, including a justification for your chosen distribution. One thing that is not covered in this workshop is evaluating the effect of the distribution on your design: when working on a specific structure (e.g., your design exercises), it would be a good idea to consider the range of acceptable distributions. For POT, for example, this could mean choosing a few combinations of threshold and declustering time to see what the impact is on a specific design variable (e.g., rock size; dike height or slope; or size of a structural element in an offshore structure.\n", " \n", "Keep in mind that depending on your application, the standard of practice can be different. For example, POT is widely used for offshore structures and breakwater design, whereas for the design of river dikes empirically derived EVA distributions based on river discharge simulations are used.\n", "