How do poachers learn which sites to pick?

Each day they update a Q-value table using reinforcement: q[site]=(1-lr)*q[site]+lr*payoff.

How are patrol sites chosen?

The ranger randomly samples patrols number of sites from the full set each day.

What do the probes measure?

catch_rate records catches over attempts while poacher_payoff averages the reward or penalty.

Inspection Game

A poacher vs. ranger game with limited patrols and adaptive poachers.

Level: Intermediate

game-theory reinforcement monitoring

Probes: catch_rate, ranger_coverage, poacher_payoff

Run

simulation.py

Inspection game: poachers vs. the lone ranger

A ranger has limited bandwidth to patrol m sites out of n each day. Multiple poachers simultaneously pick where to trespass. Anyone caught pays a heavy penalty, while a successful raid yields a reward. Poachers adapt using a simple reinforcement rule so hot spots shift over time.


import random
from tys import probe, progress


def simulate(cfg: dict):
    """Simulate repeated patrols and adaptive poachers."""

    import simpy

    env = simpy.Environment()

    n_sites = cfg["num_sites"]       # total locations that could be patrolled
    patrols = cfg["patrols"]         # how many sites the ranger covers each day
    num_poachers = cfg["num_poachers"]
    reward = cfg["reward"]           # gain for an uncaught poacher
    penalty = cfg["penalty"]         # cost if caught red-handed
    lr = cfg.get("learning_rate", 0.1)   # update weight for reinforcement
    eps = cfg.get("epsilon", 0.1)        # exploration probability
    sim_time = cfg["sim_time"]

    rng = random.Random(cfg.get("seed", 123))

Each poacher tracks estimated value per site.

    q_values = [[0.0 for _ in range(n_sites)] for _ in range(num_poachers)]

    catch_count = 0
    attempts = 0
    total_payoff = 0.0

    done = env.event()

One step represents a day of patrols and poaching.

    def day():
        nonlocal catch_count, attempts, total_payoff
        for t in range(sim_time):
            patrol_sites = rng.sample(range(n_sites), k=patrols)
            probe("ranger_coverage", env.now, patrols / n_sites)

            for p in range(num_poachers):
                q = q_values[p]
                if rng.random() < eps:
                    site = rng.randrange(n_sites)
                else:
                    best = max(q)
                    best_sites = [i for i, v in enumerate(q) if v == best]
                    site = rng.choice(best_sites)

                attempts += 1
                if site in patrol_sites:
                    catch_count += 1
                    payoff = -penalty
                else:
                    payoff = reward

simple reinforcement update

                q[site] = (1 - lr) * q[site] + lr * payoff
                total_payoff += payoff

            catch_rate = catch_count / attempts
            avg_payoff = total_payoff / attempts
            probe("catch_rate", env.now, catch_rate)
            probe("poacher_payoff", env.now, avg_payoff)
            progress(100 * (t + 1) / sim_time)
            yield env.timeout(1)

        done.succeed({"catch_rate": catch_rate, "avg_payoff": avg_payoff})

    env.process(day())
    env.run(until=done)
    return done.value


def requirements():
    return {
        "builtin": ["micropip", "pyyaml"],
        "external": ["simpy==4.1.1"],
    }

Default.yaml

num_sites: 5
patrols: 2
num_poachers: 3
reward: 10
penalty: 50
learning_rate: 0.2
epsilon: 0.1
sim_time: 100

Charts (Default)

ranger_coverage

CSV

Samples	100 @ 0.00–99.00
Values	min 0.40, mean 0.40, median 0.40, max 0.40, σ 0.00

catch_rate

CSV

Samples	100 @ 0.00–99.00
Values	min 0.25, mean 0.35, median 0.37, max 0.44, σ 0.04

poacher_payoff

CSV

Samples	100 @ 0.00–99.00
Values	min -16.67, mean -11.25, median -12.29, max -5.29, σ 2.69

Final Results (Default)

Metric	Value
catch_rate	0.41
avg_payoff	-14.60

FAQ

How do poachers learn which sites to pick?: Each day they update a Q-value table using reinforcement: q[site]=(1-lr)*q[site]+lr*payoff.
How are patrol sites chosen?: The ranger randomly samples patrols number of sites from the full set each day.
What do the probes measure?: catch_rate records catches over attempts while poacher_payoff averages the reward or penalty.

Hospital ER Patient Flow Inventory Oscillation

Open in Playground