...
【24h】

Unsupervised learning of natural languages

机译:无监督学习自然语言

获取原文
获取原文并翻译 | 示例
           

摘要

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
机译:我们解决了语言学,生物信息学和某些其他学科的基本问题,即使用原始符号顺序数据语料库来推断控制其生产的基本规则。给定一串字符串(例如文本,转录语音,染色体或蛋白质序列数据,乐谱等),我们的无监督算法会从中递归地提取其层次结构化模式。 ADIOS(结构的自动精炼)算法依赖于一种统计方法来进行模式提取和结构化概括,这两个过程与语言习得有关。已对具有数千条规则的人工无上下文语法,在英语和汉语等多种自然语言以及在序列与功能相关的蛋白质数据上进行了评估。这种无监督的算法能够学习复杂的语法,生成语法新颖的句子,并证明在需要从原始数据(例如生物信息学)中发现结构的其他领域很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号