Making decisions heuristically

Many economic models of decision-making assume rational behavior; that is, agents use the information at their disposal as efficiently as possible and optimizing their behavior to maximize their utility. This view has recently (well, in the past sixty years) come under fire from the psychology, sociology, and behavioral economics communities as being overly optimistic regarding humans’ abilities to perform truly optimal activity. Proponents of this behavioral approach suggest instead a model of “bounded rationality” (term due to Herbert Simon) in which agents try to perform optimal decision-making, but are limited by their own cognitive abilities and resource constraints.

What if it were possible for agents to reach nearly optimal outcomes (i.e., almost Nash equilibria) while heuristically optimizing without even knowledge of the function they’re trying to optimize? What if this were possible even if agents don’t necessarily know of each others’ existence?

Consider a two-agent game in which there is some utility (or fitness) function \(\pi: \mathbb{R}^2 \rightarrow \mathbb{R}\) continuous, and suppose that agents play actions \(x \in \mathbb{R}\). When agents play certain actions, they observe the resulting output from \(\pi\), the form of which they don’t know. They’ll soon realize that one of two basic cases holds, at least locally:

1) More is better, meaning that for \(c \in \mathbb{R}\) some constant and \(x_1 < x_1’\), \(\pi(x_1, c) \leq \pi(x_1’, c)\) (and likewise for agent 2)

2) More is worse, meaning that if \(x_1 < x_1’\), \(\pi(x_1, c) \geq \pi(x_1’, c)\).

With enough sampling in this manner, the agents should be able to figure out where the local maxima are, and alter their actions accordingly. As time goes on, agents want to keep the gains they’ve made (i.e., they’re risk-averse) and so they move their actions less according to a discount function \(\beta: [0, \infty) \rightarrow (0, 1)\). For now, we’ll just consider exponential discounting \(\beta(t) = b^t\) and hyperbolic discounting \(\beta(t) = \frac{1}{1+kt}\).

There are many algorithms the agents could follow, but two seem reasonably close to human-like heuristic behavior:

  • Adaptive step: if \(\text{sgn}(x_t - x_{t-1}) = -\text{sgn}(\pi_t - \pi_{t-1})\), set \(\mu_t = \min(x_t, x_{t-1}) - \beta(t)(x_t - x_{t-1})\). Otherwise, set \(\mu_t = \max(x_t, x_{t-1}) + \beta(t)(x_t - x_{t-1})\). Then, draw \(x_{t+1} \sim \mathcal{N}(\mu_t, \sigma_t^2)\). We choose \(\sigma_t = \beta(t)\) normally, but a more thoughtful decision-maker could take into account his current level of fitness and choose\(\sigma_t = \exp(\frac{-\nu t}{1 + \max(0, F_t)})\), where \(\nu \sim \mathcal{U}(a, b)\) for \(0 < a < b < 1\). We’ll call this second method the cooling method for its similarity to the Boltzmann distribution.

  • Jump: if \(\text{sgn}(x_t - x_{t-1}) = -\text{sgn}(\pi_t - \pi_{t-1})\), set \(x_{t+1} = \min(x_t, x_{t-1}) + a\eta\), where \(a = -\text{sgn}(x_t - x_{t-1})\). Otherwise, set \(x_{t+1} = \max(x_t, x_{t-1}) + a\eta\), where \(a = \text{sgn}(x_t - x_{t-1})\). Here, \(\eta \sim \text{Exponential}(\beta(t))\).

In both algorithms, agents try to figure out if “more is better” or “more is worse” and then act accordingly. Note that the agents never actually see the action of the other agent or the functional form of \(\pi\). They just see numerical values resulting (in part) from their actions.

Example: coordination game

We’ll first just watch two agents play the continuous version of the coordination game. Their utility (or fitness) functions are given by

Clearly there is no upper limit on utility, but agents must coordinate to play actions with the same sign. We’ll check out all three optimization methods, and use both exponential and hyperbolic discounting.

%matplotlib inline

from agents import *
from itertools import product
import numpy as np
import matplotlib.pyplot as plt

RUNS = 10
timesteps = 100
methods = ['adaptive_step', 'adaptive_step_cooling', 'jump']
discounts = ['exponential', 'hyperbolic']

for method, discount in product(methods, discounts):
    fig, axes = plt.subplots(1, 2, figsize=(10, 4))
    for run in range(RUNS):
        agent_1, agent_2 = equal_info_runner([abs_sum, abs_sum], method=method, timesteps=timesteps, discount=discount)
        a1f, = axes[0].plot(range(timesteps), agent_1.fitnesses, 'r', label='Agent 1')
        a2f, = axes[0].plot(range(timesteps), agent_2.fitnesses, 'b', label='Agent 2')
        a1a, = axes[1].plot(range(timesteps), agent_1.actions, 'r', label='Agent 1')
        a2a, = axes[1].plot(range(timesteps), agent_2.actions, 'b', label='Agent 2')

    axes[0].legend(handles=[a1f, a2f], loc='best')
    axes[0].set_xlabel('Time')
    axes[0].set_ylabel('Fitness')
    axes[1].legend(handles=[a1a, a2a], loc='best')
    axes[1].set_xlabel('Time')
    axes[1].set_ylabel('Action')
    fig.suptitle(method+' '+discount, fontsize=20)

png

png

png

png

png

png

At first glance, it seems like jump really underperforms; its maximum attained values are nowhere near what adaptive_step comes up with. But we’d expect that in the context of this problem, since adaptive_step moves around in proportion to its prior change in fitness, and jump doesn’t. (At least, not yet. This is going to be the subject of ongoing development.) Note that agents are definitely coordinating; they’re always going in the same direction. With jump, agents seem to share the load much more equitably than with adaptive_step.

Now, let’s consider a more interesting problem: Cournot quantity competition. We imagine that agents one and two are producing an identical good, and each seeks to maximize his profit. The profit function is given by

where \(p(x_i, x_{-i}) = k - x_i - x_{-i}\). In this example, we’ll choose \(k = 50\) and \(c = 1\), so that the Nash equilibrium strategies are \(x_1^* = x_2^* = \frac{49}{3}\), with equilibrium profit for each agent \(\pi_1^* = \pi_2^* \approx 267\).

# let's increase the number of timesteps and runs
RUNS = 25
timesteps = 500

for method, discount in product(methods, discounts):
    fig, axes = plt.subplots(1, 2, figsize=(10, 4))
    for run in range(RUNS):
        agent_1, agent_2 = equal_info_runner([cournot, cournot], method=method, timesteps=timesteps, discount=discount)
        a1f, = axes[0].plot(range(timesteps), agent_1.fitnesses, 'r', label='Agent 1')
        a2f, = axes[0].plot(range(timesteps), agent_2.fitnesses, 'b', label='Agent 2')
        a1a, = axes[1].plot(range(timesteps), agent_1.actions, 'r', label='Agent 1')
        a2a, = axes[1].plot(range(timesteps), agent_2.actions, 'b', label='Agent 2')

    # plot the Nash eq actions and fitness
    nash_f, = axes[0].plot(range(timesteps), [267. for _ in range(timesteps)], 'g', linewidth=3, label='Nash eq')
    nash_a, = axes[1].plot(range(timesteps), [49./3 for _ in range(timesteps)], 'g', linewidth=3, label='Nash eq')
        
    axes[0].legend(handles=[a1f, a2f, nash_f], loc='best')
    axes[0].set_xlabel('Time')
    axes[0].set_ylabel('Fitness')
    axes[1].legend(handles=[a1a, a2a, nash_a], loc='best')
    axes[1].set_xlabel('Time')
    axes[1].set_ylabel('Action')
    fig.suptitle(method+' '+discount, fontsize=20)

png

png

png

png

png

png

There’s a lot to unpack here. First, we note that, in the case of adaptive_step and jump, the agents converge on the Nash equilibrium without either knowing the other agent’s true action or the functional form of the profit function. That’s pretty interesting. Second, there’s some unusual behavior happening with adaptive_step_cooling: agents who initially have low fitness essentially give up on optimizing—they’re losers in the dynamic game—whereas agents who initially have higher fitness are motivated to produce more and more, eventually capturing the entirety of the market share. This makes sense mathematically when we consider the term \(\sigma_t = \exp(\frac{-\nu t}{1 + \max(0, F_t)})\). It also corresponds with our experience of reality; firms who can’t start off strong often fail, while firms who make a splash and have initial success (Apple, Google, etc.) become titans of industry. (Also note the “success stories” in adaptive_step_cooling where a very few firms manage to break out of the low-tier, climbing ambitiously into the top bracket.)

Also, note that when agents use jump they appear to both be producing at lower than the Nash equilibrium production level, resulting in a higher realized utility for both—an example of cartel formation. Let’s see if this difference in production levels is significantly different from the Nash equilibrium.

from scipy.stats import bayes_mvs

for discount in discounts:
    calc = {key: value for key, value in [('a1_fit', []), ('a2_fit', []), ('a1_act', []), ('a2_act', [])]}
      
    for run in range(RUNS):
        agent_1, agent_2 = equal_info_runner([cournot, cournot], method=method, timesteps=timesteps, discount=discount)
        calc['a1_fit'].append(agent_1.fitnesses[-1])
        calc['a2_fit'].append(agent_2.fitnesses[-1])
        calc['a1_act'].append(agent_1.actions[-1])
        calc['a2_act'].append(agent_2.actions[-1])
        
    fig, axes = plt.subplots(1, 4, figsize=(14,6) )
    nash = {'a1_fit' : 267, 'a2_fit': 267, 'a1_act': 49./3, 'a2_act': 49./3}
    i = 0
    
    for entry in calc.keys():
        mean, _, std = bayes_mvs(calc[entry], alpha=0.95)
        axes[i].hist(calc[entry], normed=True, label=entry)
        axes[i].axvspan(mean.statistic, mean.statistic, color='k', linewidth=2, 
                      label=entry+' est mean')
        axes[i].axvspan(mean.minmax[0], mean.minmax[1], facecolor='r',
                       alpha=0.3, label=entry+' est mean confidence')
        axes[i].axvspan(nash[entry], nash[entry], color='g', linewidth=2, 
                      label=entry+' Nash eq')
        axes[i].legend(loc='upper center', bbox_to_anchor=(0.5, 1.05))
        axes[i].set_xlabel('parameter')
        axes[i].set_ylabel('$P($ parameter $)$')
        i += 1

png

png

Recall that the Nash equilibrium action was \(\frac{49}{3}\), or about 16.3, and the Nash equilibrium profit was about 267. Both of those numbers are outside of the Bayesian 95% confidence intervals for the means that we observe, meaning that there’s significant evidence that the jump method provides some mechanism for agents to cooperate and form a cartel, earning better than the Nash equilibrium profit after a certain number of timesteps.

More to come

This project is in its infancy. Source code is here.

Copyright David Dewhurst 2017- . Permission is given to distribute and use this material for educational purposes only; in particular, no usage of this material for commercial purposes or usage in the creation of academic papers is permitted.

Written on February 18, 2017