Naturally bounded normal processes

Here’s a quick overview of a naturally-bounded Weiner process (I’ll explain what that is momentarily) and an application to election prediction. In particular, we will see why it isn’t really at all surprising that Donald Trump won the 2016 presidential election.

Recall that a Weiner process is a type of continuous-time stochastic process that can be characterized in many ways. I am going to use the informal characterization (the physicist’s description) and merely say that a Weiner process is a stochastic process \(W_t\) defined such that \(dW_t \sim \mathcal{N}(\mu=0, \sigma=\sqrt{dt})\). Brownian motion \(X_t\) with drift \(\mu(x,t)\) and volatility \(\sigma(x,t)\) is defined by

and is fundamental to many areas of science.

Now, we will consider a modified Weiner process with natural bound \([0,1]\). By natural support it is meant that we are not imposing boundary conditions on the process \(X_t\) but rather constructing a process with Gaussian (well, almost Gaussian) increments that is still unable to exit the interval \([0,1]\). Here is how we will do this. Recall the definition of the truncated-Gaussian distribution \(\text{Trunc}-\mathcal{N}(\mu, \sigma; a, b)\) as a continuous random variable with PDF given by

We have denoted by \(\Phi(z)\) the CDF of the standard Gaussian and by \(\phi(z)\) its derivative. Define the truncated (or naturally-bounded) Weiner process as

and its corresponding Brownian motion with drift and volatility is

The reader will note that we have normalized the process to the interval \([0,1]\) for convenience of analysis and for use in application presently.

It is not too hard to see that, with sufficient conditions on \(\mu\) (you can derive these yourself), we have \(\Pr(X_t \in [0,1]) = 1\) for all \(t\). We should note that, although this process is very similar in appearance (notation?) to a standard Weiner process, this is only really true near \(\frac{1}{2}\), where, for suitably small volatility \(\sigma\), there is very little probability density near zero and one even in the standard (unbounded) Gaussian. If \(X_t\) is close to the natural boundaries zero and one, however, the process becomes heavily skewed back toward the center of the interval and behavior diverges from normality substantially. We refer to this process as naturally-bounded as we have not imposed reflecting or absorbing boundary conditions, but rather constructed the collection of underlying random variables to have zero probability of crossing the boundaries.

Here’s an example of what this process can look like. We will set \(\mu(x,t) \sim \text{Laplace}(\text{mean }=0.01)_t \), where we draw a new Laplace random variable with mean 0.01 for each \(t\), and set \(\sigma = 0.1\). We draw ten processes from the naturally-bounded Weiner process distribution, each with initial condition set to \(X_0 = 0.48\).

png

At each \(t\), we compute the spatial mean \(\mathbb{E}_W^{[0,1]}[X_t] \), which is plotted as the thick black line. We may also be interested in the spatial probability distributions \(p(X_t)]\) at each point in time, which are shown below.

png

We see that, for this specific process, the probability that \(X_T\) (the value of the naturally-bounded Weiner process at the last time point) is greater than 0.5 is a little more than one half.

Application to elections

This process is not only analytically interesting, but it can be useful in understanding the behavior of random processes that are naturally constrained to lie in some compact domain. Consider the case of some politicians running for higher office. If the voting system used is purely a first-past-the-post plurality vote, we can model it pretty well with this process. We start by denoting each candidate \(i\)’s polling popularity at time \(t\) by \(X^{(i)}_t\) and noting that, if \(C\) candidates are in the race, we have

If there are only two candidates in the race, \(X\) and \(Y\), this simplifies to \( \Pr(X_T > Y_T) = \Pr(X_T - Y_T > 0) \).

Let us take a (by now) classic case of an election that “no one saw coming”, the Hilary Clinton versus Donald Trump 2016 presidential election. I collected data from Real Clear Politics that gives the result of every head-to-head polling contest between Clinton and Trump up until the day before the election. When multiple poll results occurred on the same day, I averaged the results. (I did not take into account the repute of the pollster a la 538.) Here’s what the data looks like—again, this is up until the day before the election.

png

Let’s denote these processes \(X\) for Clinton and \(Y\) for Trump. We can sample from the naturally-constrainted processes that generated these mean-field estimate as follows, assuming that the randomness is, in fact, distributed as a truncated-normal process on \([0,1]\). There is no reason to assume that the randomness is multiplicative, so we will assume that the noise is additive and sample from the underlying truncated-normal process. We set \(\sigma\) to be the volatility of each process—e.g., we have

and similarly for \(Y\). Below are the resulting simulations.

png

Compared to the mean estimates given above, we see a lot more confusion over who exactly is going to win the popular vote—the result isn’t clear at all! Looking at the PDF for \(Z_T = X_T - Y_T\), it’s even more of a toss-up.

png

Wow! This estimate gives that Clinton has only a 58% chance of winning the popular vote on November 8th, 2016. Since Republicans seem to be better at gerrymandering than Democrats, it shouldn’t really be surprising that Trump might be able to secure a majority of electoral votes.

By the way, if you haven’t been paying attention, that’s what happened.

Looking at this sequence of distributions in time kind of tells the story of the election.

png

We see the distribution start off heavily favoring Clinton, swing over toward favoring Trump for a short while, and then finally swing back over to favoring Clinton, albeit less than before, until it makes a sharp run for the center during the final days of the contest.

By the way, if you’re wondering why the estimate \(\Pr(Z_T \geq 0 ) \) is changing in the fourth decimal place every time I calculate it, it’s because I’m calculating a Monte Carlo estimate for this probability:

where I have made \(N = \) 10,000 draws from the estimated distribution \(p(Z_T)\).

About that estimation. One reasonable criticism could be that my estimate is sensitive to the method by which I estimate \(p(Z_T)\). So far I have been using a Gaussian kernel, so that the PDF is estimated as

where \(h\) is the bandwidth of the kernel. To alleviate these concerns, I’ll re-estimate \(p\) using the top hat kernel, defined as

We will now estimate

Here are the same quantities as calculated above.

png png

We give a 2% higher probability that Clinton wins the election with this method, but still a very far cry from a certainty.

Written on May 19, 2018