首页> 外文期刊>Literary & linguistic computing >ANNIS3: A new architecture for generic corpus query and visualization
【24h】

ANNIS3: A new architecture for generic corpus query and visualization

机译:ANNIS3:用于通用语料库查询和可视化的新架构

获取原文
获取原文并翻译 | 示例
           

摘要

This article is concerned with the data structures, properties of query languages, and visualization facilities required for the generic representation of richly annotated, heterogeneous linguistic corpora. We propose that above and beyond a general graph-based data model, which is becoming increasingly popular in many complex annotation formats, a well-defined concept of multiple, potentially conflicting segmentation layers must be introduced to deal with different sources and applications of corpus data flexibly. We also propose a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data. We offer an implementation and evaluation of our architecture in ANNIS3, an open-source browser-based architecture for corpus search and visualization. We present three case studies to test the coverage of the system, encompassing core linguistic and digital humanities use-cases including richly annotated newspaper treebanks, multilingual diplomatic and normalized manuscript materials edited in TEI, and analysis of multimodal recordings of spoken language.
机译:本文涉及数据结构,查询语言的属性以及可视化工具,这些工具用于丰富注释的异构语言语料库的通用表示形式。我们建议,除了基于通用图的数据模型(在许多复杂的注释格式中越来越流行)之外,还必须引入定义明确的概念,即可能存在冲突的多个分割层,以处理不同的语料库数据源和应用灵活地。我们还为使用注释触发的样式表的Web界面中的专用语料库可视化提出了通用解决方案,该样式表利用了现代浏览器和CSS的功能来获取多个高度可定制的主数据视图。我们在ANNIS3中提供了我们架构的实现和评估,ANNIS3是一种基于开放源代码浏览器的语料库搜索和可视化架构。我们提供了三个案例研究来测试系统的覆盖范围,包括核心的语言和数字人文用例,包括带注释的报纸树库,在TEI中编辑的多语种外交和规范化手稿材料,以及对多语言录音的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号