首页> 外文期刊>Computer speech and language >tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification
【24h】

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

机译:TAX2VEC:构建来自短文分类的分类学分类的可解释特征

获取原文
获取原文并翻译 | 示例
           

摘要

The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy-based features, and demonstrate its use on six short text classification problems: prediction of gender, personality type, age, news topics, drug side effects and drug effectiveness. The constructed semantic features, in combination with fast linear classifiers, tested against strong baselines such as hierarchical attention neural networks, achieves comparable classification results on short text documents. The algorithm's performance is also tested in a few-shot learning setting, indicating that the inclusion of semantic features can improve the performance in data-scarce situations. The tax2vec capability to extract corpus-specific semantic keywords is also demonstrated. Finally, we investigate the semantic space of potential features, where we observe a similarity with the well known Zipf's law.
机译:背景知识的使用在很大程度上是在文本分类任务中未开发的。本文探讨了播放分类学作为构建新的语义特征的手段,这可能提高学习分类器的性能和鲁棒性。我们提出了税收2VEC,一种用于构建基于分类的特征的并行算法,并展示其在六个简短文本分类问题上的使用:性别,人格类型,年龄,新闻主题,药物副作用和药物效果的预测。构造的语义特征与快速线性分类器结合使用,与强的基线(如分层关注神经网络)进行测试,在短文本文档上实现了可比的分类结果。该算法的性能也在几次拍摄的学习设置中测试,表明包含语义特征可以提高数据稀缺情况的性能。还证明了提取特定语料库的语料库语谱的税收2VEC能力。最后,我们调查了潜在特征的语义空间,在那里我们观察着众所周知的ZIPF定律的相似性。

著录项

  • 来源
    《Computer speech and language》 |2021年第1期|101104.1-101104.21|共21页
  • 作者单位

    Jozef Stefan Institute Jamova 39 Ljubljana 1000 Slovenia Jozef Stefan International Postgraduate School jamova 39 Ljubljana 1000 Slovenia;

    Jozef Stefan Institute Jamova 39 Ljubljana 1000 Slovenia Jozef Stefan International Postgraduate School jamova 39 Ljubljana 1000 Slovenia;

    Jozef Stefan Institute Jamova 39 Ljubljana 1000 Slovenia;

    Jozef Stefan Institute Jamova 39 Ljubljana 1000 Slovenia University of Nova Gorica Clavni trg 8 Vipava 5271 Slovenia;

    Jozef Stefan Institute Jamova 39 Ljubljana 1000 Slovenia;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    taxonomies; vectorization; text classification; short documents; feature construction; semantic enrichment;

    机译:分类学;矢量化;文本分类;短文件;特征结构;语义富集;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号