...
首页> 外文期刊>Journal of Computers >Lexical-semantic SLVM for XML Document Classification
【24h】

Lexical-semantic SLVM for XML Document Classification

机译:用于XML文档分类的词法语义SLVM

获取原文
           

摘要

Structured link vector model (SLVM) and itsimproved version depend on statistical term measures toimplement XML document representation. As a result, theyignore the lexical semantics of terms and its mutualinformation, leading to text classification errors. This paperproposed a XML document representation method,WordNet-based lexical-semantic SLVM, to solve theproblem. Using WordNet, this method constructed a datastructure for characterizing lexical semantic contents ofXML document, and adjusted EM modeling todisambiguate word stems. Then, synset matrix of lexicalsemantic contents was built in the lexical-semantic featurespace for XML document representation, and lexicalsemantic relations were marked on it to construct thefeature matrix in lexical-semantic SLVM. On categorizeddataset of Wikipedia XML, using NWKNN classificationalgorithm, the experimental results show that the featurematrix of our method performs F1 measure better thanoriginal SLVM and frequent sub-tree SLVM based on TFIDF.
机译:结构化链接矢量模型(SLVM)及其改进版本取决于统计术语量度,以实现XML文档表示。结果,他们忽略了术语的词汇语义及其相互信息,从而导致文本分类错误。提出了一种基于WordNet的词汇语义SLVM XML文档表示方法。该方法使用WordNet构建了用于表征XML文档词汇语义内容的数据结构,并调整了EM建模以消除词干歧义。然后,在词法语义特征空间中建立词法语义内容的同义词集矩阵,用于XML文档表示,并在其上标记词法语义关系,构造词法语义SLVM中的特征矩阵。在NWKNN分类算法上,对Wikipedia XML的分类数据集进行实验,结果表明,该方法的特征矩阵在基于TFIDF的情况下,其F1度量优于原始SLVM和频繁子树SLVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号