...
首页> 外文期刊>Communications of the Association for Information Systems >Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial
【24h】

Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial

机译:信息系统研究人员的文本挖掘:带注释的主题建模教程

获取原文
           

摘要

Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challenges encountered when applying automated text-mining techniques in information systems research. In particular, we showcase how to use probabilistic topic modeling via Latent Dirichlet allocation, an unsupervised text-mining technique, with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifact by automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers, this tutorial provides guidance for conducting text-mining studies on their own and for evaluating the quality of others.
机译:分析师估计,当今有80%以上的数据以非结构化形式(例如,文本,音频,图像,视频)存储-其中大部分以丰富而big昧的自然语言表示。传统上,为了分析自然语言,人们已经使用了定性的数据分析方法,例如手动编码。但是,从Internet获得的文本数据集的大小使手动分析实际上变得不可能。在本教程中,我们讨论了在信息系统研究中应用自动文本挖掘技术时遇到的挑战。特别是,我们展示了如何通过Latent Dirichlet分配(一种无监督的文本挖掘技术)和LASSO多项逻辑回归来使用概率主题建模,并通过自动分析12,000多个在线客户评论来解释用户对IT工件的满意度。对于其他信息系统研究人员,本教程为自己进行文本挖掘研究以及评估其他质量提供了指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号