A little forager wakes up in a maze it doesn't understand — walls, hazards, and one patch of food. At first it just flails. Then, episode by episode, it figures the world out: a policy crystallizes, every arrow turns toward the food, and a random twitch becomes a confident beeline. This is real reinforcement learning — tabular Q-learning, no libraries, no GPU. $0, runs anywhere.