首页> 外文期刊>Information Processing & Management >Automatic thematic classification of election manifestos
【24h】

Automatic thematic classification of election manifestos

机译:选举宣言的自动主题分类

获取原文
获取原文并翻译 | 示例
           

摘要

We digitized three years of Dutch election manifestos annotated by the Dutch political scientist Isaac Lipschits. We used these data to train a classifier that can automatically label new, unseen election manifestos with themes. Having the manifestos in a uniform XML format with all paragraphs annotated with their themes has advantages for both electronic publishing of the data and diachronic comparative data analysis. The data that we created will be disclosed to the public through a search interface. This means that it will be possible to query the data and filter them on themes and parties. We optimized the Lipschits classifier on the task of classifying election manifestos using models trained on earlier years. We built a classifier that is suited for classifying election manifestos from 2002 onwards using the data from the 1980s and 1990s. We evaluated the results by having a domain expert manually assess a sample of the classified data. We found that our automatic classifier obtains the same precision as a human classifier on unseen data. Its recall could be improved by extending the set of themes with newly emerged themes. Thus when using old political texts to classify new texts, work is needed to link and expand the set of themes to newer topics.
机译:我们将由荷兰政治学家Isaac Lipschits注释的三年荷兰大选宣言数字化。我们使用这些数据来训练分类器,该分类器可以自动用主题标记新的,看不见的选举宣言。将清单使用统一的XML格式,并在所有段落中标注主题,这对于数据的电子发布和历时的比较数据分析均具有优势。我们创建的数据将通过搜索界面公开。这意味着可以查询数据并根据主题和参与方对其进行过滤。我们使用早期训练的模型,在对选举宣言进行分类的任务上优化了Lipschits分类器。我们使用1980年代和1990年代的数据构建了适合于从2002年开始对选举宣言进行分类的分类器。我们通过让领域专家手动评估分类数据样本来评估结果。我们发现我们的自动分类器在看不见的数据上获得了与人工分类器相同的精度。通过将主题集扩展为新出现的主题,可以提高其召回率。因此,当使用旧的政治文本对新文本进行分类时,需要进行工作以将主题集链接和扩展到较新的主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号