首页> 外文会议>International conference on applications of natural language to information systems >What You Use, Not What You Do: Automatic Classification of Recipes
【24h】

What You Use, Not What You Do: Automatic Classification of Recipes

机译:使用什么而不是做什么:食谱的自动分类

获取原文

摘要

Social media data is notoriously noisy and unclean. Recipe collections built by users are no exception, particularly when it comes to cataloging them. However, consistent and transparent categorization is vital to users who search for a specific entry. Similarly, curators are faced with the same challenge given a large collection of existing recipes: They first need to understand the data to be able to build a clean system of categories. This paper presents an empirical study on the automatic classification of recipes on the German cooking website Chefkoch. The central question we aim at answering is: Which information is necessary to perform well at this task? In particular, we compare features extracted from the free text instructions of the recipe to those taken from the list of ingredients. On a sample of 5,000 recipes with 87 classes, our feature analysis shows that a combination of nouns from the textual description of the recipe with ingredient features performs best (48% F_1). Nouns alone achieve 45% F_1 and ingredients alone 46% F_1. However, other word classes do not complement the information from nouns. On a bigger training set of 50,000 instances, the best configuration shows an improvement to 57% highlighting the importance of a sizeable data set.
机译:众所周知,社交媒体数据嘈杂且不干净。用户构建的配方集合也不例外,尤其是在对它们进行分类时。但是,一致和透明的分类对于搜索特定条目的用户至关重要。同样,策展人在面临大量现有食谱的情况下也面临着同样的挑战:他们首先需要了解数据,以便能够建立一个清晰的类别系统。本文对德国烹饪网站Chefkoch上的食谱自动分类进行了实证研究。我们旨在回答的中心问题是:要完成此任务,必须提供哪些信息?特别是,我们将从食谱的自由文本说明中提取的功能与从成分列表中获取的功能进行比较。在具有87个类别的5,000个食谱的样本中,我们的特征分析表明,食谱文字描述中带有名词特征的名词组合表现最佳(48%F_1)。单是名词就能达到45%的F_1,单是成分就能达到46%的F_1。但是,其他单词类别不能补充名词的信息。在50,000个实例的较大训练集上,最佳配置将性能提高了57%,从而突出了可观数据集的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号