In spoken document retrieval, the main factor affecting retrieval performance is speech recognition errors. Refining speech recognition technology can make improvement of speech recognition performance. However, if a query has out-ofvocabulary words, we cannot get the spoken documents related to the query. This paper describes spoken document retrieval using document expansion based on WEB whose contents are similar to the spoken documents retrieved. Most of spoken documents have some topics. Therefore, each spoken document is automatically divided into some segments depending on topic. And then, similar WEB pages to the spoken document can be collected using the query derived from the segment. The document expansion using WEB achieved improvement of the spoken document retrieval performance from 0.364 to 0.401 on interpolated 11-points average precition metric.
展开▼