Inspection Game
A poacher vs. ranger game with limited patrols and adaptive poachers.
Level:Intermediate
FAQ
- How do poachers learn which sites to pick?
- Each day they update a Q-value table using reinforcement: q[site]=(1-lr)*q[site]+lr*payoff.
- How are patrol sites chosen?
- The ranger randomly samples patrols number of sites from the full set each day.
- What do the probes measure?
- catch_rate records catches over attempts while poacher_payoff averages the reward or penalty.
simulation.py
Inspection game: poachers vs. the lone ranger
A ranger has limited bandwidth to patrol m sites out of n each day. Multiple poachers simultaneously pick where to trespass. Anyone caught pays a heavy penalty, while a successful raid yields a reward. Poachers adapt using a simple reinforcement rule so hot spots shift over time.
import random
from tys import probe, progress
def simulate(cfg: dict):
"""Simulate repeated patrols and adaptive poachers."""
import simpy
env = simpy.Environment()
n_sites = cfg["num_sites"] # total locations that could be patrolled
patrols = cfg["patrols"] # how many sites the ranger covers each day
num_poachers = cfg["num_poachers"]
reward = cfg["reward"] # gain for an uncaught poacher
penalty = cfg["penalty"] # cost if caught red-handed
lr = cfg.get("learning_rate", 0.1) # update weight for reinforcement
eps = cfg.get("epsilon", 0.1) # exploration probability
sim_time = cfg["sim_time"]
rng = random.Random(cfg.get("seed", 123))
Each poacher tracks estimated value per site.
q_values = [[0.0 for _ in range(n_sites)] for _ in range(num_poachers)]
catch_count = 0
attempts = 0
total_payoff = 0.0
done = env.event()
One step represents a day of patrols and poaching.
def day():
nonlocal catch_count, attempts, total_payoff
for t in range(sim_time):
patrol_sites = rng.sample(range(n_sites), k=patrols)
probe("ranger_coverage", env.now, patrols / n_sites)
for p in range(num_poachers):
q = q_values[p]
if rng.random() < eps:
site = rng.randrange(n_sites)
else:
best = max(q)
best_sites = [i for i, v in enumerate(q) if v == best]
site = rng.choice(best_sites)
attempts += 1
if site in patrol_sites:
catch_count += 1
payoff = -penalty
else:
payoff = reward
simple reinforcement update
q[site] = (1 - lr) * q[site] + lr * payoff
total_payoff += payoff
catch_rate = catch_count / attempts
avg_payoff = total_payoff / attempts
probe("catch_rate", env.now, catch_rate)
probe("poacher_payoff", env.now, avg_payoff)
progress(100 * (t + 1) / sim_time)
yield env.timeout(1)
done.succeed({"catch_rate": catch_rate, "avg_payoff": avg_payoff})
env.process(day())
env.run(until=done)
return done.value
def requirements():
return {
"builtin": ["micropip", "pyyaml"],
"external": ["simpy==4.1.1"],
}
Default.yaml
num_sites: 5
patrols: 2
num_poachers: 3
reward: 10
penalty: 50
learning_rate: 0.2
epsilon: 0.1
sim_time: 100
Charts (Default)
ranger_coverage
Samples | 100 @ 0.00–99.00 |
---|---|
Values | min 0.40, mean 0.40, median 0.40, max 0.40, σ 0.00 |
catch_rate
Samples | 100 @ 0.00–99.00 |
---|---|
Values | min 0.25, mean 0.35, median 0.37, max 0.44, σ 0.04 |
poacher_payoff
Samples | 100 @ 0.00–99.00 |
---|---|
Values | min -16.67, mean -11.25, median -12.29, max -5.29, σ 2.69 |
Final Results (Default)
Metric | Value |
---|---|
catch_rate | 0.41 |
avg_payoff | -14.60 |