In prior work, we introduced adaptive plan-view height and occupancy templates, derived from stereo camera data, for person tracking and activity recognition. These templates efficiently capture current details of each tracked persons body pose, thereby enabling good tracking performance even when multiple people occlude and interact with each other. However, the templates ignore useful color information, and their rapid evolution makes them poorly suited for recognizing the same person at well-separated times. In this paper, we seek to remedy both of these shortcomings, by 1) adding novel plan-view color templates to our short-term, template-based models of person appearance, and 2) augmenting our person descriptions with longer-term models that describe invariants of each persons shape and color. We demonstrate how each of these improves our real-time tracking performance on challenging, multi-person sequences. Notes: Copyright IEEE. Presented at the IEEE International Conference on Advanced Video and Signal-Based Surveillance, 15-16 September 2005, Como, Italy 6 Pages