Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback

机译：使用强化学习和直观反馈构建文档处理链

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We model a document treatment chain as a Markov Decision Process, and use reinforcement learning to allow the agent to learn to construct and continuously improve custom-made chains "on the fly". We build a platform which enables us to measure the impact on the learning of various models, web services, algorithms, parameters, etc. We apply this in an industrial setting, specifically to an open source document treatment chain which extracts events from massive volumes of web pages and other open-source documents. Our emphasis is on minimising the burden of the human analysts, from whom the agent learns to improve guided by their feedback on the events extracted. For this, we investigate different types of feedback, from numerical feedback, which requires a lot of tuning, to partially and even fully qualitative feedback, which is much more intuitive, and demands little to no user calibration. We carry out experiments, first with numerical feedback, then demonstrate that intuitive feedback still allows the agent to learn effectively.

机译：我们将文档处理链建模为“马尔可夫决策过程”，并使用强化学习以使代理能够“实时”学习构建和不断改进定制链。我们构建了一个平台，使我们能够测量对各种模型，Web服务，算法，参数等的学习产生的影响。我们将其应用于工业环境，特别是应用于从大量文档中提取事件的开源文档处理链网页和其他开源文档。我们的重点是最大程度地减少人工分析人员的负担，代理商可以从他们那里汲取有关事件提取的反馈，从而从中学习改进。为此，我们研究了不同类型的反馈，从需要大量调整的数值反馈到部分甚至完全定性的反馈，这种反馈更加直观，几乎不需要用户校准。我们先进行数值反馈，然后再进行实验，证明直观的反馈仍然可以使代理有效学习。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2016年|635-639|共5页
会议地点
作者
Esther Nicart; Bruno Zanuttini; Hugo Gilbert; Bruno Grilhères; Frdéric Praca;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Artificial intelligence; Web services; Data mining; Logic gates; Buildings; Markov processes; Open source software;

机译：人工智能; Web服务;数据挖掘;逻辑门;建筑物;马尔可夫过程;开源软件;

相似文献

外文文献
中文文献
专利

1. Actor-critic reinforcement learning for the feedback control of a swinging chain ? [J] . C. Dengler, B. Lohmann IFAC PapersOnLine . 2018,第13期

机译：演员批评强化学习，用于摆动链的反馈控制？
2. Actor-critic reinforcement learning for the feedback control of a swinging chain ? [J] . C. Dengler, B. Lohmann IFAC PapersOnLine . 2018,第13期

机译：演员批评强化学习，用于摆动链的反馈控制？
3. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots [J] . Xin?Ma, Ya?Xu, Guo-qiang?Sun, Journal of Zhejiang university science . 2013,第3期

机译：用于自主移动机器人路径规划的状态链顺序反馈强化学习
4. Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback [C] . Esther Nicart, Bruno Zanuttini, Hugo Gilbert, International Conference on Tools with Artificial Intelligence . 2016

机译：使用强化学习和直观反馈构建文档处理链
5. The Feedback-related Negativity is a Time-dependent Brain Mechanism that Facilitates Aversive Learning: Implications for the Reinforcement Learning FRN Hypothesis [D] . Rawls, Eric. 2019

机译：与反馈相关的消极性是一项时间依赖的脑机制，便于厌恶学习：对钢筋学习的影响FRN假设
6. Building the evidence base on the HIV programme in India: an integrated approach to document programmatic learnings [O] . Deepika Ganju, Bidhubhusan Mahapatra, Rajatashuvra Adhikary, 2018

机译：以印度的HIV计划为基础建立证据：记录计划学习的综合方法
7. Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback [O] . Nicart, Esther, Zanuttini, Bruno, Gilbert, Hugo, 2016

机译：使用强化学习和直观反馈构建文档处理链

Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback

摘要

著录项

相似文献

相关主题

期刊订阅