首页> 外文会议>International conference on natural language processing >Bundeli Folk-Song Genre Classification with kNN and SVM
【24h】

Bundeli Folk-Song Genre Classification with kNN and SVM

机译:具有kNN和SVM的Bundeli民歌流派分类

获取原文

摘要

While large data dependent techniques have made advances in between-genre classification, the identification of subtypes within a genre has largely been overlooked. In this paper, we approach automatic classification of within-genre Bundeli folk music into its subgenres; Gaari, Rai and Phag. Bundeli, which is a dominant dialect spoken in a large belt of Ut-tar Pradesh and Madhya Pradesh has a rich resource of folk songs and an attendant folk tradition. First, we successfully demonstrate that a set of common stopwords in Bundeli can be used to perform broad genre classification between standard Bundeli text (newspaper corpus) and lyrics. We then establish the problem of structural and lexical similarity in within-genre classification using n-grams. Finally, we classify the lyrics data into the three genres using popular machine-learning classifiers: Support Vector Machine (SVM) and kNN classifiers achieving 91.3% and 85% and accuracy respectively. We also use a Naive Bayes classifier which returns an accuracy of 75%. Our results underscore the need to extend popular classification techniques to sparse and small corpora, so as to perform hitherto neglected within genre classification and also exhibit that well known classifiers can also be employed in classifying 'small' data.
机译:尽管依赖于大数据的技术在流派之间的分类方面取得了进步,但是流派内的子类型的识别却被大大地忽略了。在本文中,我们将邦德利民俗音乐的内部流派自动分类为子流派。 Gaari,Rai和Phag。邦德里(Bundeli)是在北方邦和中央邦(Mandya Pradesh)的一大带说的主要方言,其民歌资源丰富,并伴有民间传统。首先,我们成功地证明了Bundeli中的一组常见停用词可用于在标准Bundeli文本(报纸语料库)和歌词之间执行广泛的流派分类。然后,我们使用n-gram建立内部流派分类中的结构和词汇相似性问题。最后,我们使用流行的机器学习分类器将歌词数据分为三种类型:支持向量机(SVM)和kNN分类器,分别达到91.3%和85%的准确性。我们还使用了朴素贝叶斯分类器,该分类器返回75%的准确性。我们的结果强调了将流行的分类技术扩展到稀疏和小型语料库的必要性,以便执行迄今为止在体裁分类中被忽视的分类,并且还显示出众所周知的分类器也可以用于对“小”数据进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号