The automatic recognition of Chinese place names, a special case of the recognition of Chinese special nouns, is an important task in Chinese information processing. In this paper, we propose an approach combining statistical and rule-based techniques. The proposed approach discovers candidates from Chinese texts based upon the probability of a character being part of a Chinese place name; and confirms or eliminates the candidates by applying rules obtained by human summarization and transformation-based machine learning. In this approach, we employ a statistical measure: weight of likelihood (WOL), to estimate the likelihood of a character being part of a Chinese place name in real corpora. To the authors' knowledge, it is the first time WOL has been used to capture the capability of a character forming Chinese places names in real corpora. We evaluate the performance of our approach on a real data set and the recall and precision are 97% and 90.92% respectively.
展开▼