首页> 外文OA文献 >Preparing, restructuring, and augmenting a French treebank:udlexicalised parsers or coherent treebanks?

【2h】

Preparing, restructuring, and augmenting a French treebank:udlexicalised parsers or coherent treebanks?

机译：准备，重组和扩充法国树库： ud词汇化的解析器或连贯的树库？

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebankud(P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, weudinvestigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover,udregression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training setudwill far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find thatudlexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005)).

机译：我们提出了经过改进的法国树库（MFT），这是经过彻底改造的法国树库，它源自更干净，更连贯的Paris 7 Treebank ud（P7T），具有多个转换后的结构，并介绍了新的语言分析。为了确定这些更改的影响，我们将研究MFT在统计分析中的表现。在MFT训练集上训练的概率解析器（当前3800棵树）的性能已经比在P7T数据上训练五倍的同行（18,548棵树）更好，这提供了统计分析中数据质量胜于数量的重要性的极端例子。此外，对在MFT上训练的解析器的学习曲线进行回归分析，可以得出这样的预测：在完整的预计的18,548树MFT训练集上训练的解析器 ud将远远超过在完整的P7T上训练的解析器。这些分析还显示出有问题的数据如何导致有问题的结论，特别是，我们发现，法语概率解析中的非词法化可能没有以前所想的那么重要（Arun和Keller（2005））。

著录项

作者
Schluter Natalie; van Genabith Josef;
展开▼
作者单位

展开▼
年度 2007
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Treebanks gone bad: Parser evaluation and retraining using a treebank of ungrammatical sentences [J] . Jennifer Foster International Journal on Document Analysis and Recognition . 2007,第3a4期

机译：树库变糟：解析器评估和使用语法错误的树库进行再训练
2. Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing [J] . Shi Shumin, Luo Dan, Wu Xing, ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：基于多级基于块的组成依赖性TreeBank转换，用于藏依赖性解析
3. Correction to: Development and evaluation of an Urdu treebank (CLE-UTB) and a statistical parser [J] . Ehsan Toqeer, Hussain Sarmad Language Resources and Evaluation . 2021,第2期

机译：纠正：Urdu TreeBank（CLE-UTB）的开发和评估和统计解析器
4. Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking [C] . Andrew MacKinlay, Rebecca Dridan, Dan Flickinger, IJCNLP 2011 . 2011

机译：treeblaping：使用外部树木银行来过滤解析森林以进行解析选择和树木银行
5. Full Forest Treebanking. [D] . Packard, Woodley. 2015

机译：充分的森林树木保护。
6. Using a Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon to Assign SNOMED CT Codes to Anatomic Sites and Pathologic Diagnoses in Full Text Pathology Reports [O] . Henry J. Lowe, Yang Huang, Donald P. Regula 2009

机译：使用带有UMLS专家词典增强的统计自然语言解析器为全文病理报告中的解剖部位和病理诊断分配SNOMED CT代码
7. Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources [O] . Schluter Natalie 2011

机译：基于树库的法语概率解析资源深层语法习得

Preparing, restructuring, and augmenting a French treebank:udlexicalised parsers or coherent treebanks?

摘要

著录项

相似文献

相关主题

期刊订阅