首页> 中文期刊> 《情报学报》 >标点符号在网络中文学术文献识别与检索中的作用

标点符号在网络中文学术文献识别与检索中的作用

         

摘要

With the ever-growing numbers of academic papers published online, it is important to explore more effi-cient ways to identify those papers with general search engines. Aiming at providing ideas for automatic identification of online Chinese academic papers, this essay presents a comparative study of Chinese academic papers and news reports on their use of punctuation. Two corpora were built and analyzed: one comprising 6,906 academic papers and another comprising 16,316 news reports. Comparison of total punctuation numbers, relative usage rate, and average usage numbers between the two types of documents reveal that both similarities and differences exist in the usage of punctuation. Similarities in macro, relative level, and two stable sequences of punctuation usage were discovered, while the differences, which lie in micro and absolute level and independent sample non-parametric tests, show that Chinese academic papers and news reports are significantly different in their use of all 14 kinds of punctuation ana-lyzed in this study. The findings were tested in the NSIRS, a system formerly developed by the authors, to which a punctuation analysis module was added to evaluate the identifying effect of punctuation. Retrieval experiments show that the punctuation characteristics of academic papers do have identifying effects and can be used to improve the re-trieval precision of academic articles online.%学术文献在网络上的分布日益广泛,探索其识别方法对于提高检索效率具有重要意义.本文针对网络中文学术文献的主要干扰文献——新闻报道,对6906篇学术文献语料和16316篇新闻报道语料进行比较研究,尝试从标点符号的使用方面发现两者的异同,以期为网络中文学术文献的自动识别提供思路.对两个语料库标点符号的使用量、相对使用率、平均使用量和差异量等因素所做统计与比较显示,网络中文学术文献与新闻报道在14种常用标点符号的使用上具有明显差异.我们将所发现的标点符号特征应用到已开发的网络中文学术文献检索系统(NSIRS)中,在原系统中加入标点符号分析模块并进行了检索实验,结果显示标点符号对于网络学术文献的识别具有明显效果,系统的平均相对检准率提高了约6%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号