首页> 外文会议>International Conference on Data and Software Engineering >Handling Out of Vocabulary in supervised event extraction on Indonesian tweets: Using word representation, word list, word context and document level features
【24h】

Handling Out of Vocabulary in supervised event extraction on Indonesian tweets: Using word representation, word list, word context and document level features

机译:在印度尼西亚推文上的监督事件提取中处理词汇:使用Word表示,Word List,Word上下文和文档级别功能

获取原文

摘要

Extracting event information from Twitter is still promising since there are many Twitter accounts built just to spread the event information broadly. The most difficult part to extract event information is the Out of Vocabulary (OOV) problem, especially for event name. Here, we tried to enhance the features used for our supervised event extraction. These features include the word representation (skip-gram model and brown cluster), word list (event name and event location), word context and document level feature. By using CRF as the classification algorithm, 4 fold cross validation technique, and 1,300 tweets, the best F-Measure score achieved for OOV cases was 0.6 which is a significant improvement compared to the baseline of 0.445. The enhanced features also improved the F-Measure score for all vocabulary case from 0.693 (baseline) into 0.814 (proposed).
机译:从Twitter中提取事件信息仍然很有希望,因为只有许多推特账户,只是为了广泛传播事件信息。提取事件信息最困难的部分是词汇(OOV)问题,尤其是对于事件名称。在这里,我们试图增强用于我们监督事件提取的功能。这些功能包括单词表示(Skip-Gram Model和Brown Cluster),Word List(事件名称和事件位置),Word上下文和文档级别功能。通过使用CRF作为分类算法,4倍交叉验证技术和1,300条推文,OOV案例所获得的最佳F测量分数为0.6,与0.445的基线相比,这是一个显着的改善。增强功能还改善了0.693(基线)的所有词汇表的F测量分数为0.814(提出)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号