The anatomy of a search and mining system for digital humanities

机译：数字人文搜索与挖掘系统的剖析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Samtla (Search And Mining Tools with Linguistic Analysis) is an online integrated research environment designed in collaboration with historians and linguists to facilitate the study of digitised texts written in any language. It currently supports the research of two corpora: the Genizah collection held by the Taylor-Schechter Genizah Research Unit in Cambridge University, and a collection of Aramaic incantation texts from late antiquity. In contrast to standard search engines and text mining systems that rely on the bag-of-words representation of text, Samtla provides the retrieval and discovery of fuzzy text patterns/motifs (aka “formulae” to historians), which is achieved through applying a character-based n-gram statistical language model built on top of a powerful generalised suffix tree data structure. This paper brie y describes the major components of Samtla and their underlying techniques.

机译：Samtla（具有语言分析功能的搜索和挖掘工具）是一个在线综合研究环境，与历史学家和语言学家合作设计，以促进对以任何语言编写的数字化文本的研究。目前，它支持两种语料库的研究：由剑桥大学的泰勒·谢克特·热尼扎研究部持有的热尼扎收藏，以及古代晚期的阿拉姆语咒语文本的收藏。与依赖于单词的词袋表示法的标准搜索引擎和文本挖掘系统相比，Samtla提供了对模糊文本模式/图案（对历史学家而言又称为“公式”）的检索和发现，这是通过应用基于字符的n元语法统计语言模型，建立在强大的广义后缀树数据结构之上。本文简述了Samtla的主要组成部分及其基础技术。

著录项

来源
《2014 IEEE/ACM Joint Conference on Digital Libraries》|2014年|165-168|共4页
会议地点 London(GB)
作者
Harris Martyn; Levene Mark; Zhang Dell; Levene Dan;
展开▼
作者单位

Department of Computer Science, Birkbeck, University of London, WC1E 7HX, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Collaboration; Communities; Computational modeling; Data models; Educational institutions; Mathematical model; Text mining; Collaborative Search; Digital Humanities; Sequence Alignment; Statistical Language Model; Suffix Tree;

机译：协作;社区;计算模型;数据模型;教育机构;数学模型;文本挖掘;协作搜索;数字人文科学;序列比对;统计语言模型;后缀树;;

相似文献

外文文献
中文文献
专利

1. Critical Digital Humanities: The Search for a Methodology [J] . John Rodzvilla Journal of web librarianship . 2019,第4期

机译：批判性数字人文科学：寻找方法论
2. Humanity Database Compilation and Digital Library Service by Cooperating ODB and Fulltext Search Engine [J] . KATSUMI MARUYAMA 情報処理学会論文誌 . 1999,第3期

机译：通过ODB和全文搜索引擎协作进行人性化数据库编译和数字图书馆服务
3. Text Mining Digital Humanities Projects: Assessing Content Analysis Capabilities of Voyant Tools [J] . A. Miller Journal of web librarianship . 2018,第3期

机译：文本挖掘数字人文项目：评估Voyant工具的内容分析能力
4. The anatomy of a search and mining system for digital humanities [C] . Harris Martyn, Levene Mark, Zhang Dell, IEEE/ACM Joint Conference on Digital Libraries . 2014

机译：数字人文学科搜索和采矿系统的解剖
5. Digital Advances in Triggering and Data Acquisition Systems for Large Scale Dark Matter Search Experiments [D] . Druszkiewicz, Eryk Filip. 2017

机译：大规模暗物质搜索实验的触发和数据采集系统的数字进展
6. Next-Generation Digital Ecosystem for Climate Data Mining and Knowledge Discovery: A Review of Digital Data Collection Technologies [O] . Angel Hsu, Willie Khoo, Nihit Goyal, 2020

机译：气候数据挖掘与知识发现的下一代数字生态系统：数字数据收集技术综述
7. The anatomy of a search and mining system for digital humanities : Search And Mining Tools for Language Archives (SAMTLA) [O] . Harris Martyn 2017

机译：数字人文搜索和挖掘系统的解剖：语言档案搜索和挖掘工具（samTLa）
8. Human Factors in Mining Search System [R] . Fowkes, R. S., Aiken, E. G. 1990

机译：挖掘搜索系统中的人为因素

The anatomy of a search and mining system for digital humanities

摘要

著录项

相似文献

相关主题

期刊订阅