首页> 外文期刊>Computer speech and language >IXIR: A statistical information distillation system
【24h】

IXIR: A statistical information distillation system

机译:IXIR:统计信息提纯系统

获取原文
获取原文并翻译 | 示例
           

摘要

The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that arc relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machines. The distinguishing contribution of the approach is a novel method to generate classification features. The features are extracted from charts, compilations of elements from various annotation layers, such as word transcriptions, syntactic and semantic parses, and information extraction (IE) annotations. We describe a procedure for creating charts from documents and queries, while paying special attention to query slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered), and suggest various types of classification features that can be extracted from these charts. While observing a 30% relative improvement due to non-lexical annotation layers, we perform a detailed analysis of the contributions of each of these layers to classification performance.
机译:信息提取的任务是从与指定模板查询相关的大量多语言音频和文本文档源中提取摘要。我们提出一种方法,重点放在蒸馏过程的句子提取阶段。它通过支持向量机的统计分类来选择与查询相关的文档句子。该方法的显着贡献是一种生成分类特征的新颖方法。这些功能是从图表中提取的,从各种注释层(如单词转录,句法和语义解析)以及信息提取(IE)注释中汇编元素。我们描述了一种从文档和查询创建图表的过程,同时特别注意查询位置(以模板为中心的名称,组织,主题,事件等的自由文本描述),并建议各种类型的分类功能可以从这些图表中提取出来。虽然观察到由于非词法注释层而产生的30%的相对改进,但我们对每个层对分类性能的贡献进行了详细的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号