Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

机译：确定科学排行榜构建的任务，数据集，评估指标和数字分数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different areas on different datasets is likely to become increasingly difficult. The community could greatly benefit from an automatic system able to summarize scientific results, e.g., in the form of a leaderboard. In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards. Experiments show that our model outperforms several baselines by a large margin. Our model is a first step towards automatic leaderboard construction, e.g., in the NLP domain.

机译：尽管快速启动新任务和新数据集有助于促进社区朝着有趣方向发展积极的研究，但要跟踪不同数据集上不同领域的大量研究活动可能会变得越来越困难。社区可以从能够总结科学结果的自动系统中受益匪浅，例如以排行榜的形式。在本文中，我们建立了两个数据集，并开发了一个框架（TDMS-IE），旨在自动从NLP论文中提取任务，数据集，指标和分数，从而自动建立排行榜。实验表明，我们的模型大大优于几个基准。我们的模型是例如在NLP域中自动排行榜构建的第一步。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|5203-5213|共11页
会议地点
作者
Yufang Hou; Charles Jochim; Martin Gleize; Francesca Bonin; Debasis Ganguly;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets [J] . Karanam Srikrishna, Gou Mengran, Wu Ziyan, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第3期

机译：人员重新识别的系统评估和基准：功能，指标和数据集
2. Scoring of senescence signalling in multiple human tumour gene expression datasets, identification of a correlation between senescence score and drug toxicity in the NCI60 panel and a pro-inflammatory signature correlating with survival advantage in peritoneal mesothelioma [J] . Kyle Lafferty-Whyte, Alan Bilsland, Claire J Cairney, BMC Genomics . 2010,第1期

机译：在多个人类肿瘤基因表达数据集中对衰老信号进行评分，在NCI60面板中鉴定衰老评分与药物毒性之间的相关性以及与腹膜间皮瘤生存优势相关的促炎信号
3. Evaluation of copy-move forgery detection: datasets and evaluation metrics [J] . Al-Qershi Osamah M., Khoo Bee Ee Multimedia Tools and Applications . 2018,第24期

机译：评估复制移动伪造检测：数据集和评估指标
4. Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction [C] . Yufang Hou, Charles Jochim, Martin Gleize, Annual meeting of the Association for Computational Linguistics . 2019

机译：识别科学排行榜结构的任务，数据集，评估指标和数字分数
5. Image Captioning: A Survey of Existing Issues on Datasets, Evaluation Metrics and Methods [D] . zhou, liwan . 2020

机译：图像字幕：对数据集的现有问题，评估度量和方法的调查
6. Scoring of senescence signalling in multiple human tumour gene expression datasets identification of a correlation between senescence score and drug toxicity in the NCI60 panel and a pro-inflammatory signature correlating with survival advantage in peritoneal mesothelioma [O] . Kyle Lafferty-Whyte, Alan Bilsland, Claire J Cairney, 2010

机译：在多个人类肿瘤基因表达数据集中对衰老信号进行评分在NCI60面板中鉴定衰老评分与药物毒性之间的相关性以及与腹膜间皮瘤生存优势相关的促炎信号
7. A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets [O] . Karanam, Srikrishna, Gou, Mengran, Wu, Ziyan, 2017

机译：人员重新识别的系统评估和基准：功能，指标和数据集

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

摘要

著录项

相似文献

相关主题

期刊订阅