Humans create internal models of an environment (i.e. cognitive maps) through subjective sensorimotor experiences and can also understand spatial locations by looking at an external map as a symbol of an environment. We simulate the development of the cognitive map from sensorimotor experiences and grounding of the external map in a single deep neural network model. Our proposed network has a shared module that processes the features of multiple modalities (i.e. vision, hearing, and touch) and even external maps in the same manner. The multiple modalities are encoded into feature vectors by modality-specific encoders, and the encoded features are processed by the same shared module. The proposed network was trained to predict the sensory inputs of a simulated mobile robot. After the predictive learning, the spatial representation was developed in the internal states of the shared module, and the same spatial representation was used for predicting multiple modalities, including the external map. The network can also perform spatial navigation by associating the external map with the cognitive map. This implies that the external maps are grounded in subjective sensorimotor experiences, being bridged through the developed internal spatial representation in the shared module.
展开▼