Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

机译：通过统计功能和DT-SVM在大规模数据中的语义实体识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Semantic entities carry the most important semantics of text data. However, traditional approaches such as named entity recognition and new word identification may only detect some specific types of entities. In addition, they generally adopt sequence annotation algorithms such as Hidden Markov Model (HMM) and Conditional Random Field (CRF) which can only utilize limited context information. As a result, they are inefficient on the extraction of semantic entities that were never shown in the training data. In this paper we propose a strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. With the proposed statistical features and novel classification approach, our strategy can detect more semantic entities than traditional approaches such as CRF and Bootstrapping-SVM methods. It is very sensitive to new entities that just appear in fresh data. Our experimental results have shown that the precision, recall rate and F-One rate of our strategy are about 23.6%, 21.5% and 25.8% higher than that of the representative approaches on average.

机译：语义实体携带最重要的文本数据语义。然而，传统的方法，例如命名实体识别和新单词识别可以只检测一些特定类型的实体。另外，它们通常采用序列注释算法，例如隐藏的马尔可夫模型（HMM）和条件随机字段（CRF），其只能利用有限的上下文信息。因此，它们对从未在训练数据中显示的语义实体提取的效率低下。在本文中，我们提出了一种通过集成统计特征，决策树（DT）和支持向量机（SVM）算法来提取未知文本语义实体的策略。通过拟议的统计特征和新颖的分类方法，我们的策略可以检测比传统方法更具语义实体，例如CRF和Bootstraping-SVM方法。它对新的实体非常敏感，刚刚出现在新数据中。我们的实验结果表明，我们的策略的精确度，召回率和F-One率比平均代表方法高出23.6％，21.5％和25.8％。

著录项

来源
《International conference on web information systems engineering》|2013年||共14页
会议地点
作者
Dingxian Wang; Xiao Liu; Hangzai Luo; Jianping Fan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词
Semantic Entity Identification; New Word Identification; Decision Tree; SVM;

机译：语义实体识别;新单词识别;决定树;SVM;

相似文献

外文文献
中文文献
专利

1. A novel framework for semantic entity identification and relationship integration in large scale text data [J] . Dingxian Wang, Xiao Liu, Hangzai Luo, Future generation computer systems . 2016,第nova期

机译：大规模文本数据中语义实体识别和关系集成的新框架
2. Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets [J] . Mehta Vivek, Bawa Seema, Singh Jasmeet Expert systems with applications . 2021,第Jula期

机译：稳定性群集：组合统计和语义特征来群集大文本数据集
3. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling [J] . Choi H, Ghosh D, Nesvizhskii AI Journal of proteome research . 2008,第1期

机译：使用目标诱饵数据库搜索策略和灵活的混合物建模对大规模蛋白质组学中肽段鉴定进行统计验证
4. Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM [C] . Dingxian Wang, Xiao Liu, Hangzai Luo, International conference on web information systems engineering . 2013

机译：通过统计特征和DT-SVM识别大规模数据中的语义实体
5. Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features. [D] . Leaman, James Robert, Jr. 2013

机译：具有多元特征选择和语义动机特征的生物医学命名实体识别。
6. Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data [O] . Da Xu, Jialin Zhang, Hanxiao Xu, 2020

机译：基于多标度的基于肿瘤分类和鉴定生物标志物的特征选择和基因组数据的目标
7. Bridge anomaly data identification method based on statistical feature mixture and data augmentation through forwarding difference [O] . Yang Qiu, Liang Jing, Sheng Li 2021

机译：基于统计特征混合的桥梁异常数据识别方法和通过转发差异的数据增强

Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

摘要

著录项

相似文献

相关主题

期刊订阅