首页> 外文期刊>Code4Lib Journal >Using Amazon Mechanical Turk to Transcribe Historical Handwritten Documents.
【24h】

Using Amazon Mechanical Turk to Transcribe Historical Handwritten Documents.

机译:使用Amazon Mechanical Turk转录历史手写文档。

获取原文
获取原文并翻译 | 示例
           

摘要

The developing “information age” is continually unraveling new ways of discovering, presenting and sharingninformation. Most new academic material is digitally formatted upon its creation and is thus easy to find and query.nHowever, there remains a good deal of material from times prior to the “information age” that has yet to be convertednto digital form. Much of this material can be found in library collections—whether academic, public or private—and thusnremains available only to a limited number of locals or willing-and-able sojourners. Using OCR technology, mostntypeset documents can be digitized and made available online; and there are several projects underway to do exactlynthis. However, there remains little to be done for handwritten materials. Those who own collections of handwrittenndocuments are increasingly wanting to make the content thereof available to the general public. Unfortunately,ntraditional transcription models typically prove to be expensive or inefficient and pdf snapshots are not searchable. Wenhave developed a model for digital transcription using Google Docs and Amazon’s Mechanical Turk. Using this model,none can use an online workforce to efficiently transcribe handwritten texts and perform quality control at a cost muchnlower than professional transcription services. To illustrate the model we used Amazon’s Mechanical Turk to transcribenand then proofread the Frederick Douglass Diary which we have made available on a public searchable wiki. The totalncost of transcription and proofreading for the 72 page diary was less than $25.00 with some pages being transcribednand proofread for as little as $0.04. Our results show that using Amazon’s Mechanical Turk holds great promise fornproviding an affordable transcription method for hand-written historical documents making them easily sharable andnfully searchable.
机译:不断发展的“信息时代”正在不断探索发现,呈现和共享信息的新方式。大多数新的学术资料在创建时都是数字格式的,因此很容易查找和查询。然而,从“信息时代”到现在,仍有大量资料尚未转换为数字形式。这些资料大部分都可以在图书馆的馆藏中找到,无论是学术性的,公共的或私人的,因此仅对有限数量的当地人或愿意和愿意的寄宿者可用。使用OCR技术,大多数类型的文档都可以数字化并在线获取;目前有几个项目正在执行此操作。但是,手写材料几乎没有什么可做的。那些拥有手写文档集合的人越来越希望将其内容提供给公众。不幸的是,传统的转录模型通常被证明是昂贵的或无效的,并且pdf快照是不可搜索的。 Wenhave使用Google Docs和Amazon的Mechanical Turk开发了一种数字转录模型。使用此模型,没有人可以使用在线劳动力来有效地转录手写文本并执行质量控制,而其成本要比专业转录服务低得多。为了说明该模型,我们使用了Amazon的Mechanical Turk进行转录,然后校对了Frederick Douglass Diary,我们已在可搜索的公共Wiki上提供了该日记。 72页日记本的转录和校对总费用低于25.00美元,其中一些页面被转录和校对仅为0.04美元。我们的结果表明,使用Amazon的Mechanical Turk有望为手写的历史文档提供一种价格合理的转录方法,从而使它们易于共享和搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号