This proposal describes a Hybrid, Model-Based Motion Capture System which uses a unique, low-cost, credit-card-sized stereo range-image detector to augment an image-understanding system suitable for tracking the body movements of cooperative users. The resulting system combines the best features of aided and unaided optical tracking systems into a single new product with broad application in military and commercial markets for distributed simulation. Robust tracking of key body parts will be achieved through the use of minimally invasive optical targets integrated into standard microphone/headsets. The remaining body states will be estimated using visual understanding algorithms working from the range-image data generated by the stereo vision module. A human model will be at the heart of the image understanding system and state estimation filter, combining together all sources of information to track, predict, understand, and render the state of the human operator. A demonstration will be developed which will allow a speaker and his graphical representation to be transmitted with no more than 10% more bandwidth that required for speech alone. Such a communication system will enable distributed simulation application and could become a workhorse for the military and commercial motion capture markets.