首页> 外文会议>9th International conference on language resources and evaluation >Distributed Distributional Similarities of Google Books over the Centuries
【24h】

Distributed Distributional Similarities of Google Books over the Centuries

机译:跨世纪Google图书的分布式分布相似性

获取原文

摘要

This paper introduces a distributional thesaurus and sense clusters computed on the complete Google Syntactic N-grams, which is extracted from Google Books, a very large corpus of digitized books published between 1520 and 2008. We show that a thesaurus computed on such a large text basis leads to much better results than using smaller corpora like Wikipedia. We also provide distributional thesauri for equal-sized time slices of the corpus. While distributional thesauri can be used as lexical resources in NLP tasks, comparing word similarities over time can unveil sense change of terms across different decades or centuries, and can serve as a resource for diachronic lexicography. Thesauri and clusters are available for download.
机译:本文介绍了根据完整的Google语法N-gram计算的分布词库和有义类,这些词类是从Google图书中提取的,该图书是1520年至2008年之间出版的非常大型的数字化图书集。与使用较小的语料库(如Wikipedia)相比,基础结果要好得多。我们还为语料库的相等大小的时间片提供分布式叙词表。尽管分布式叙词表可以用作NLP任务中的词汇资源,但是随着时间的推移比较单词相似度可以揭示不同年代或几个世纪中术语的意义变化,并且可以用作历时词典词典的资源。叙词表和群集可供下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号