首页> 外文会议>2011 3rd International Conference on Computer Research and Development >A document comparison approach using hybrid keyword and structured full text vocabulary searches
【24h】

A document comparison approach using hybrid keyword and structured full text vocabulary searches

机译:使用混合关键字和结构化全文词汇搜索的文档比较方法

获取原文

摘要

This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step builds a suffix tree of frequently used vocabulary to retrieve the most similar documents from the acquired documents. In so doing, variations on contextual matching of full text search can be mitigated, wherein the resulting performance turns out to be quite acceptable. The ultimate goal is to arrive at a platform independent full text search technique that can be realized. The benefits for this scheme are two folds. On the one hand, relevant document can be retrieved as close to the desired document as possible. On the other hand, suspect plagiarism can be identified to some extent, which is dependent on the effectiveness of the proposed approach with plenty of rooms for future improvement. The proposed work will eventually be put to real use for database retrieval in a small business enterprise.
机译:本文提出了一种系统的全文检索,该文档使用组合关键字和正在考虑中的文档结构相似性对文档进行搜索。该方法分两个步骤进行。第一步,使用一组指定的关键字通过开源工具获取潜在的所需文档。第二步建立常用词汇的后缀树,以从获取的文档中检索最相似的文档。这样做,可以减轻全文搜索的上下文匹配的变化,其中所产生的性能被证明是完全可以接受的。最终目标是要实现一种可以独立于平台的全文本搜索技术。该方案的好处有两个方面。一方面,可以尽可能接近所需文档来检索相关文档。另一方面,可疑窃在一定程度上可以被识别,这取决于所提出方法的有效性,并为将来的改进留出了很大的空间。拟议的工作最终将在小型企业中真正用于数据库检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号