import matplotlib
if not hasattr(matplotlib.RcParams, "_get"):
    matplotlib.RcParams._get = dict.get

Monte Carlo simulation#

What approximating integrals has to do with stochastic modeling#

Many applications of stochastic modeling in engineering require evaluating integrals of probability density functions. For example, in a temperature forecast, you might want the expected value or variance of the predicted temperature; in flood modeling, you may need the exceedance probability for a certain water level. All of these quantities are defined by integrals. Even obtaining marginal distributions from a joint distribution of two variables involves integration.

If the variables of interest follow simple, well-characterized distributions (e.g., Gaussian), these integrals can often be computed analytically. Real-world models, however, are usually more complex: the variables may not follow simple distributions, the model outputs may be highly non-linear, and the models are often multi-dimensional. In such cases, analytical solutions are not feasible, and we must evaluate the integrals numerically.

In one dimension, conventional numerical integration techniques like quadrature rules can work. For instance, we could evaluate the function on a regular grid and integrate using the trapezoidal rule. However, in higher dimensions, these approaches become quickly inefficient.

An alternative is to approximate the integral using random samples — this is called Monte Carlo integration. It is intuitive because it mirrors how we are used to think about probability distributions: to approximate the expected value of a variable, we calculate the mean of a set of samples. A Monte Carlo estimate works in the same way: generate random samples from the probability density function, then compute their average.

The next section examines how to approximate integrals with Monte Carlo estimators in more detail. The rest of this chapter focuses on the first step: generating samples from the probability density function of interest.

Monte Carlo estimator#

Monte Carlo methods are particularly suitable for evaluating expected values of random variables. That is, we can approximate integrals of the following form:

\[F = \int_{-\infty}^{\infty} f(x) p(x) dx\]

where \(p(x)\) is the probability density function (PDF) associated with a random variable \(X\), and \(f(x)\) is a function. Note that most of the integrals we are interested in (e.g., variance, expectation, cumulative probabilities) can be written in this form when we choose an appropriate function \(f\).

The Monte Carlo estimator \(\hat{F}\) of the integral \(F\) is:

\[\hat{F} = \frac{1}{N} \sum_{i}^N f(X_i)\]

where \(X_i\) are independent random variables distributed according to \(p(x)\). In practice, we generate realizations \(x_i\) from this distribution and evaluate the estimator using these sample values. That is, we evaluate the function \(f\) for each of the random samples and average the result.

Unbiasedness#

A nice property of the Monte Carlo estimator is that it is unbiased, i.e., the expectation of the Monte Carlo estimator equals the true integral:

\[\mathbb{E}[\hat{F}] = F\]

Error and convergence#

Even though the Monte Carlo estimator is unbiased, it always has an error because we can only generate a finite number of samples. The variance of the Monte Carlo estimator is:

\[\operatorname{Var}[\hat{F}] = \frac{1}{N} \operatorname{Var}[f(X)]\]

The corresponding standard error (i.e., the standard deviation of the estimator) is:

\[\operatorname{SE}[\hat{F}] = \frac{1}{\sqrt{N}} \sqrt{\operatorname{Var}[f(X)]}\]

That is, the standard error decreases with the square root of the number of samples. For example, improving the accuracy of the estimator by a factor of 10 requires 100 times more samples, meaning that convergence is relatively slow. Notice that the standard error formula does not depend on the dimension of \(X\). This is why Monte Carlo integration avoids the exponential increase in computational cost that conventional quadrature methods suffer from in high-dimensional problems.

Of course, to apply Monte Carlo integration we first need to be able to generate samples from the distributions of interest — the next sections show how this can be done in practice.

Generating samples#

Transformations of a random variable#

Often, we do not sample directly from the distribution of interest, but from another, simpler one. If the variable we want to simulate can be expressed as a transformation of another random variable, we can use the transformation method. This is often the case in modeling, where a model maps uncertain inputs to uncertain outputs.

Let \(X\) be a random variable with known distribution \(p_X(x)\) – you could think about it as an input parameter to a model for an engineering application.

\[Y = g(X)\]

is a new random variable obtained by applying a deterministic transformation \(g\), representing our model, to \(X\).

Now we are interested in the distribution of \(Y\), \(p_Y(y)\). For simple cases, it can be derived analytically using the change of variables formula, which requires that \(g\) is monotonic and invertible. However, in Monte Carlo simulation we typically do not need to derive \(p_Y\) explicitly, since sampling through the transformation is enough.

If we already have samples \(x_i\) drawn from \(X\), then obtaining samples of \(Y\) is straightforward:

\[y_i = g(x_i)\]

That is, we simply apply the transformation \(g\) (our model) to each sample. Each transformed value \(y_i=g(x_i)\) is one realization of the random variable \(Y=g(X)\).

This simple principle — generating new random variables by transforming samples — underlies many more advanced sampling methods, including inverse transform sampling, which we will explore next.

Example#

Suppose \(X∼N(0,1)\), and define \(Y=X^2\). We can easily generate samples of \(Y\) by squaring the samples of \(X\):

samples = stats.norm(0, 1).rvs(10000)
transformed = samples**2

The resulting distribution of \(Y\) follows a chi-square distribution with one degree of freedom (see Fig. 9), but we never needed to compute that analytically to obtain valid samples.

../_images/2f7063c400b24bb36fdbfe3769a377496fcf1465330fc811ded1006998ed6d83.svg

Fig. 9 Transforming Monte Carlo samples. The upper plot shows a histogram of samples from \(X\). Squaring samples of \(X\) gives the samples shown in the lower histogram. For comparison, the PDF of a chi-square distribution with one degree of freedom is shown in orange.#

Inverse transform sampling#

We can use this trick of transforming samples to generate samples from a large class of parametric distributions if we consider a special type of transformation: the inverse cumulative distribution function (inverse CDF). Hereby, we can leverage the fact that applying the cumulative distribution function of a random variable to its own samples produces values that are uniformly distributed on \((0, 1)\).

Consider that you want to generate samples of a random variable \(X\) with probability density function \(f(x)\). Further assume that the cumulative distribution function \(F(x)\) has a closed form solution and is invertible. Last but not least, you are also able to generate samples from a uniform distribution.

In that case, we can generate samples from \(X\) as follows:

  1. Derive an expression for the inverse CDF (also called quantile function), \(F^{-1}(u)\).

  2. Draw random samples \(u_i\) from a uniform distribution on the interval \((0, 1)\).

  3. Generate samples \(x_i\) as \(x_i = F^{-1}(u_i)\)

This works because, for any continuous random variable \(X\), the transformation \(U=F(X)\) produces a uniform random variable on \((0, 1)\), and conversely, \(X=F^{−1}(U)\) produces samples distributed as \(X\).

This method is simple and useful but also limited: it requires the CDF to be invertible, which generally restricts it to one-dimensional distributions. For multivariate cases, inverse transform sampling can only be applied if the variables are independent, allowing sampling from each marginal separately.

Example#

Consider the Gumbel distribution (see Fig. 10). It’s CDF is:

\[u = F(x) = \mathrm{e}^{-\mathrm{e}^{-\frac{x-\mu}{\beta}}}\]

We can solve this expression for \(x\) to obtain the inverse distribution function:

\[x = F^{-1}(u) = \mu - \beta \ln(-\ln(u))\]

Now we draw samples from a uniform distribution:

import numpy as np
import scipy.stats as stats

u = stats.uniform.rvs(size=5)
print(f"u = {u}")
u = [0.96153003 0.27164858 0.75059759 0.75973624 0.82089546]

Then we plug in the values \(u_i\) into the inverse distribution function (here using \(\mu=0\) and \(\beta=1\)):

x = -np.log(-np.log(u))
print(f"x = {x}")
x = [ 3.23832684 -0.26485809  1.24867174  1.29177008  1.62272829]

The generated samples are from the Gumbel distribution.

../_images/391b29b6d33abdcbfae9057452b0fd4371d2c0c5efacbd54af100ba6d9ea00d7.svg

Fig. 10 Inverse transform sampling illustrated with the Gumbel distribution. The CDF (lower panel) maps values from \(x\) to probabilities in the interval \((0, 1)\). We can invert this process by generating samples from a uniform distribution and applying the inverse CDF \(F^{-1}(u)\).#

Interactive Animation#

Rejection sampling#

The methods for generating samples that we covered so far are limited to some special cases where we have a considerable amount of pre-knowledge about the distribution we want to sample. For many other applications, however, we need more general methods with less requirements.

A simple but more general method is rejection sampling. The general idea of rejection sampling is to first generate samples from a known distribution (the so-called proposal distribution) and then discard samples that are inconsistent with the distribution we are interested in (the target distribution). This “correction” is done by comparing the PDF of the proposal distribution, \(q(x)\), to the target PDF \(p(x)\).

../_images/4627406bc09c49761502238e9ccb9ebce74fe819390a1542473dc5789251b463.svg

Fig. 11 Proposal and target distribution.#

Algorithm#

  1. Define a proposal distribution with density \(q(x)\) (Fig. 11).

    In principle, we can freely choose this density, but there are two important requirements: It should be defined over the same range of \(x\) values as \(p(x)\), and we need to be able to generate samples from this distribution. Later on, you will also see that it is beneficial if \(q(x)\) resembles \(p(x)\) as much as possible.

../_images/b7f7c26a45d93c5d3b390a49d00df56d06f084e1f67da97880cdabd28a22b3de.svg

Fig. 12 Scaling the proposal distribution.#

  1. Scale the proposal density such that is always greater than or equal to the target density (Fig. 12).

    That is, we define a scaling constant \(M\) so that

    \[\quad Mq(x) \geq p(x) \quad \forall x\]

    In practice, determining the scaling constant \(M\) is a bit tricky because we do not know the target PDF globally. We can only evaluate it at individual points. Therefore, in practice, this condition is loosened, only requiring that it is fulfilled for all samples \(x_i\) that we generate in the next step.

../_images/d5f792d16e71e9b63ecb722110c2655b28a0bf5b78cf6f3b7fa2ce2f908a2525.svg

Fig. 13 Sampling the proposal distribution.#

  1. Generate samples from the proposal distribution.

    That is, we obtain samples \(x_i\) from \(X \sim q(x)\). In (Fig. 13), these samples are indicated as small blue bars on the \(x\)-axis.

../_images/083ae2dddd40305f904243463e46aa25fe8a11d07727f9a9167719c7e5c0748f.svg

Fig. 14 Sampling a uniform distribution.#

  1. Draw random values \(u_i\) uniformly between \(0\) and \(Mq(x_i)\).

    For each sample \(x_i\), we draw one sample \(u_i\) from a uniform distribution \(U_i \sim \text{Uniform}(0, M q(x_i))\). In (Fig. 14), each \(u_i\) is plotted against the corresponding \(x_i\). In this way, we generate dots that cover the area below the proposal distribution.

../_images/6fdd3f372385445c860b0b528e2715b8849173590098d5a4ef96db001c6773bb.svg

Fig. 15 Rejecting samples.#

  1. Accept \(x_i\) if \(u_i \leq p(x_i)\). Otherwise, reject.

    Intuitively, this means that points falling under the curve of the target density are kept (colored orange in Fig. 15), while those above it are discarded (rejected).

Limitations#

Rejection sampling is simple and conceptually elegant, but it can become very inefficient if the proposal distribution \(q(x)\) differs greatly from the target \(p(x)\). Visually, this means that most sampled points fall in regions where \(Mq(x)\) greatly exceeds \(p(x)\), so most samples are rejected.

Efficiency also drops sharply in higher dimensions. In one dimension, it corresponds to the ratio of the area under \(p(x)\) to that under \(Mq(x)\); in higher dimensions, the ratio involves volumes, which shrink rapidly as the number of dimensions increases. This phenomenon is known as the curse of dimensionality.

Importance sampling#

Importance sampling follows a similar idea as rejection sampling: we draw samples from an easier proposal distribution \(q(x)\) instead of the target distribution \(p(x)\). Rather than rejecting samples based on the ratio \(\frac{p(x)}{q(x)}\), we use this ratio as a weight in the estimator. All samples are kept, but their influence is adjusted according to how well they represent the target distribution. This approach is often more efficient when \(q(x)\) resembles \(p(x)\) in the regions that contribute most to the integral. We will return to importance sampling later in the course, in the context of data assimilation.

The next chapter introduces Markov chain Monte Carlo (MCMC) methods, which build on the same idea of using a simpler distribution to explore a more complex one — but instead of assigning weights, they generate a chain of dependent samples that collectively follow the target distribution.