首页> 外文会议>Workshop on Creating, Using and Linking of Parliamentary Corpora with Other Types of Political Discourse >Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day
【24h】

Unfinished Business: Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day

机译:未完成的业务:1803年至今的英国议事录中语义标记的历史议会语料库的建设和维护

获取原文

摘要

Creating, curating and maintaining modern political corpora is becoming an ever more involved task. As interest from various social bodies and the general public in political discourse grows so too does the need to enrich such datasets with metadata and linguistic annotations. Beyond this, such corpora must be easy to browse and search for linguists, social scientists, digital humanists and the general public. We present our efforts to compile a linguistically annotated and semantically tagged version of the Hansard corpus from 1803 right up to the present day. This involves combining multiple sources of documents and transcripts. We describe our toolchain for tagging; using several existing tools that provide tokenisation, part-of-speech tagging and semantic annotations. We also provide an overview of our bespoke web-based search interface built on LexiDB. In conclusion, we examine the completed corpus by looking at four case studies making use of semantic categories made available by our toolchain.
机译:创建,策划和维护现代政治语料库已成为一项越来越复杂的任务。随着各种社会团体和公众对政治话语的兴趣的增长,也需要用元数据和语言注释来丰富此类数据集。除此之外,这样的语料库必须易于浏览和搜索,以寻找语言学家,社会科学家,数字人文主义者和公众。我们目前正在努力汇编从1803年至今的具有语言注释和语义标记的Hansard语料库版本。这涉及合并文档和成绩单的多个来源。我们描述了用于标记的工具链;使用提供标记化,词性标记和语义注释的几种现有工具。我们还概述了基于LexiDB构建的基于Web的定制搜索界面。总之,我们通过查看四个案例研究来研究完整的语料,这些案例研究使用了我们的工具链提供的语义类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号