Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings

机译：走向无监督的文本分类，利用专家和单词嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification aims at mapping documents into a set of predefined categories. Supervised machine learning models have shown great success in this area but they require a large number of labeled documents to reach adequate accuracy. This is particularly true when the number of target categories is in the tens or the hundreds. In this work, we explore an unsupervised approach to classify documents into categories simply described by a label. The proposed method is inspired by the way a human proceeds in this situation: It draws on textual similarity between the most relevant words in each document and a dictionary of keywords for each category reflecting its semantics and lexical field. The novelty of our method hinges on the enrichment of the category labels through a combination of human expertise and language models, both generic and domain specific. Our experiments on 5 standard corpora show that the proposed method increases Fl-score over relying solely on human expertise and can also be on par with simple supervised approaches. It thus provides a practical alternative to situations where low-cost text categorization is needed, as we illustrate with our application to operational risk incidents classification.

机译：文本分类旨在将文档映射到一组预定义的类别中。监督机器学习模型在这一领域取得了巨大成功，但它们需要大量标记的文件来达到足够的准确性。当目标类别的数量处于数百个或数百个时，这尤其如此。在这项工作中，我们探讨了一个无人监督的方法，将文档分类为类别，简单地由标签描述。该方法的灵感来自这种情况下的人类所需的方式：它在每个文档中最相关的单词与反映其语义和词条字段的每个类别的关键字字典之间的文本相似性。我们的方法涉及通过人类专业知识和语言模型的组合来铰接类别标签，包括通用和域的特定于界限。我们的实验在5标准Cothara上表明，该方法仅仅增加了人类专业知识，依靠人类专业知识，也可以与简单的监督方法相提并论。因此，在需要我们的应用风险事件分类的情况下，提供了需要低成本文本分类的情况的实际替代情况。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|cxxxiv 659 p.|共9页
会议地点
作者
Zied Haj-Yahia; Adrien Sieg; Lea A. Deleris;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning [J] . Alami Nabil, Meknassi Mohammed, En-nahnahi Noureddine Expert Systems with Application . 2019,第JUNa期

机译：通过词嵌入和集成学习来增强基于文本的无监督神经网络汇总
2. Machine learning for financial transaction classification across companies using character-level word embeddings of text fields [J] . Jorgensen Rasmus Kaer, Igel Christian International journal of intelligent systems in accounting, finance & management . 2021,第3期

机译：在使用文本字段的字符级字嵌入的公司跨越公司的金融交易分类机器学习
3. Word embedding and text classification based on deep learning methods [J] . Saihan Li, Bing Gong MATEC Web of Conferences . 2021,第a期

机译：基于深度学习方法的单词嵌入和文本分类
4. Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings [C] . Zied Haj-Yahia, Adrien Sieg, Lea A. Deleris Annual meeting of the Association for Computational Linguistics . 2019

机译：利用专家和词嵌入技术实现无监督文本分类
5. ANSWER: A Cognitively-Inspired System for the Unsupervised Detection of Semantically Salient Words in Texts [D] . Candadai Vasu, Madhavun 2015

机译：答案：认知启发性系统，用于文本中语义上显着的单词的无监督检测
6. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts [O] . Yanshan Wang, Majid Rastegar-Mojarad, Ravikumar Komandur-Elayavilli, 2017

机译：利用单词嵌入和医学实体提取来使用非结构化文本检索生物医学数据集
7. Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings [O] . Pieter Fivez, Simon Suster, Walter Daelemans 2017

机译：无监督的上下文敏感拼写校正临床自由文本与单词和字符n-gram嵌入的校正

Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅