首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >A Query Engine for Retrieving Information from Chinese HTML Documents
【24h】

A Query Engine for Retrieving Information from Chinese HTML Documents

机译:用于从中文HTML文档中检索信息的查询引擎

获取原文
获取原文并翻译 | 示例
           

摘要

The amount of online information in Chinese and the number of Chinese Internet users have been increasing tremendously during the past decade. Since Chinese language is significantly different from English, techniques that have been developed for retrieving information from English Web documents cannot be directly applied to retrieve information from Chinese Web documents. In order to provide high-performance access of Chinese information on the Web, we have developed a Chinese Web query engine that (ⅰ) extracts (hierarchical) data of interest from Chinese HTML tables using an information extraction tool called semantic hierarchy, (ⅱ) allows the user to submit queries in Chinese using a menu-driven user interface, and (ⅲ) processes the user's queries (as Boolean expressions) to generate the correct results. Our query engine supports various groups of information that are categorized into various subject areas, such as car ads, house rentals, job ads, stocks, university catalogs, etc. We have tested our information extraction tool on two application domains, car-ads and house-rental. The average F-measure on extracting Chinese data from these two application domains is above 90%. More importantly, our query engine can easily be configured and internationalized to become a worldwide, multilingual query engine with minor changes in system settings on PCs running Windows operating systems.
机译:在过去的十年中,中文在线信息的数量和中国互联网用户的数量正在急剧增加。由于中文与英文有很大不同,因此已开发的用于从英文Web文档中检索信息的技术无法直接应用于从中文Web文档中检索信息。为了在Web上提供对中文信息的高性能访问,我们开发了一个中文Web查询引擎,该引擎(ⅰ)使用称为语义层次结构(ⅱ)的信息提取工具从中文HTML表中提取感兴趣的(分层)数据。允许用户使用菜单驱动的用户界面以中文提交查询,并且(ⅲ)处理用户的查询(作为布尔表达式)以生成正确的结果。我们的查询引擎支持分类为各个主题领域的各种信息组,例如汽车广告,房屋租赁,招聘广告,股票,大学目录等。我们已经在两个应用领域(汽车广告和房屋出租。从这两个应用程序域中提取中文数据的平均F度量超过90%。更重要的是,我们的查询引擎可以轻松配置和国际化,从而成为全球性的多语言查询引擎,并且在运行Windows操作系统的PC上对系统设置进行了微小的更改。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号