# Brief and (somewhat) intuitive description of the Kolmogorov equations

I have never taken a course in probability, let alone a course in stochastic differential equations, so I am sort of winging it here. I think the following is one of the shorter “derivations” of the BKE and FKE that can be performed–but the disclaimer above is just in case there’s a well-known shorter derivation of which I’m unaware!

Suppose \( (\mathcal{F})_{t\geq 0}\) is a filtration that carries all known information about the system under study. This simply means that \(\mathcal{F}_t\) carries all information about the system from time \(t=0\) up to, and including, time \(t\). Let \(X_t\) be a diffusion process such that the following conditions are satisfied:

Since \(\mathcal{F}_t\) carries all information about the system up to time \(t\), we can rewrite the above as expectations under the space and time variables:

### BKE

We will first derive the BKE; it is fundamental as it defines the state evolution operator. The FKE (or Fokker-Planck equation) can be derived from the BKE. Suppose we have \(t’ > t\). Let us define \(V(X_T)\) as the function that gives the value of a payoff at time \(T\). The value at time \(T\) is given by \(f(X_T, T) = V(X_T)\) (the payoff is known!), and hence, moving backward in time, the function \(f\) just gives the expectation of the final payoff: \(f(x,t) = \mathbb{E}_{x,t}[V(X_T)]\). Thus we can write

Let us set \(t’ = t + dt \) and \(X_t = x\). Expanding \(f(X_{t + dt}, t + dt)\) in Taylor series gives

(We will truncate terms of the expansion at \(\mathcal{O}(dt^2)\).) Expanding the above expectation, we have

Now, since \( f(x,t) = \mathbb{E}_{x,t}[ f(X_{t + dt}, t + dt) ] \), we subtract \(f(x,t)\) from both sides of the equation to find the BKE:

### FKE

Clearly, the BKE can be rewritten as an operator equation:

where we have defined the linear operator \(\mathcal{L} = \mu(x,t) \partial_x + \frac{\sigma^2(x,t)}{2}\partial_x^2 \). Heuristically speaking, what the BKE tells us is the probability distribution of the process \(X_t\) starting at the initial point \( (x,t) \) until the present; this is why it’s the more fundamental of the two equations. What we would like is an equation that describes the probability distribution of the process going forward from the current point \( (x’, \tau) \). It is intuitive that this equation is given by

where the dagger denotes the adjoint; we want the operator to work “backwards” in some way. Our task is thus to find the adjoint operator of \(\mathcal{L}\). Since we have been implicitly assuming (by use of the double expectations above) that \(f \in L^2\), we must solve the operator equation

To do this we will integrate by parts:

Thus the adjoint operator must be

giving the familiar Fokker-Planck equation as

The FKE can be derived formally–it takes a little longer–but this treatment is intuitive. We can understand taking the adjoint of the operator as performing time reversal on the system; note that the sign of the drift term changes in the time reversal. Note also that the system does not exhibit time symmetry, as the drift and diffusion functions are acted upon by the derivatives in the adjoint but not in the BKE.