Current page: Information->Indexed and Annotated Bibliography
D. Roy and B. Schiele and A. Pentland
ABSTRACT
This paper addresses the problem of finding useful associations between audio and visual input signals. The proposed approach is based on the maximization of mutual information of audio-visual clusters. This approach results in segmentation of continuous speech signals, and finds visual categories which correspond to segmented spoken words. Such audio-visual associations may be used for modeling infant language acquisition and to dynamically personalize speech-based human-computer interfaces for various applications including catalog browsing and wearable computing. This paper describes an implemented system for learning shape names from camera and microphone input. We present results in an evaluation of the system for the domain of modeling language learning. 
ECVision indexed and annotated bibliography of cognitive computer vision publications
This bibliography was created by Hilary Buxton and Benoit Gaillard, University of Sussex, as part of ECVision Specific Action 8-1
The complete text version of this BibTeX file is available here: ECVision_bibliography.bib
Learning Audio-Visual Associations Using Mutual InformationSite generated on Friday, 06 January 2006