首页> 外文会议>IEEE Pacific Visualization Symposium >An Interactive Visual Analytics System for Incremental Classification Based on Semi-supervised Topic Modeling
【24h】

An Interactive Visual Analytics System for Incremental Classification Based on Semi-supervised Topic Modeling

机译:基于半监督主题建模的交互式增量分类可视化分析系统

获取原文

摘要

Text labeling for classification is a time-consuming and unintuitive process. Given an unannotated text collection, it is difficult for users to determine what label to create and how to label the initial training set for classification. Thus, we present an interactive visual analytics system for incremental text classification based on a semi-supervised topic modeling method, modified Gibbs sampling maximum entropy discrimination latent Dirichlet allocation (Gibbs MedLDA). Given a text collection, Gibbs MedLDA generates topics as a summary of the text collection. We design a scatter plot to display documents and topics simultaneously to show the topic information, and this helps users explore the text collection structurally and find labels for creating. After labeling documents, Gibbs MedLDA is applied to the text collection with labels again, and it generates both the topic and classification information. We also provide a scatter plot with the classifier boundary and a matrix view to present weights of classifiers. Users can iteratively label documents to refine each classifier. We evaluate our system via a user study with a benchmark corpus for text classification and case studies with two unannotated text collections.
机译:用于分类的文本标签是一个耗时且不直观的过程。给定无注释的文本集合,用户很难确定要创建哪个标签以及如何为分类的初始训练集添加标签。因此,我们提出了一种基于半监督主题建模方法,改进的Gibbs采样最大熵判别潜在Dirichlet分配(Gibbs MedLDA)的交互式视觉分析系统,用于增量文本分类。给定文本集合,Gibbs MedLDA生成主题作为文本集合的摘要。我们设计了一个散点图,以同时显示文档和主题以显示主题信息,这有助于用户从结构上探索文本集合并查找要创建的标签。在为文档加标签之后,将Gibbs MedLDA再次应用于带有标签的文本集合,它会生成主题和分类信息。我们还提供了具有分类器边界的散点图和矩阵视图,以显示分类器的权重。用户可以迭代地标记文档以细化每个分类器。我们通过使用基准语料库进行文本分类的用户研究和使用两个未注释的文本集合进行案例研究来评估我们的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号