首页> 外文会议>International Conference on Informatics in Control, Automation and Robotics >An experimentation line for underlying graphemic properties acquiring knowledge from text data with Self Organizing Maps
【24h】

An experimentation line for underlying graphemic properties acquiring knowledge from text data with Self Organizing Maps

机译:使用自组织图从文本数据中获取知识的基础字素特性的实验线

获取原文

摘要

We present an experimentation line that encompasses various stages for research on graphemes distribution and unsupervised classification. We aim to help close the gap between recent research results showing the abilities of unsupervised learning and clustering algorithms to detect underlying properties of phonemes and the present possibilities of Unicode textual representation. Our procedures need to ensure repeatability and guarantee that no information is implicitely present in the preprocessing of data. Our approach is able to categorize potential graphemes correctly, thus showing that not only phonemic properties are indeed present in textual data, but that they can be automatically retrieved from raw-unicode text data and translated into phonemic representations. By the way, we observe that SOM algorithm copes well with very sparse vectors.
机译:我们提出了一条实验线,涵盖了关于字素分布和无监督分类研究的各个阶段。我们旨在帮助缩小最近的研究结果之间的差距,这些结果表明无监督学习和聚类算法能够检测音素的基础属性以及Unicode文本表示形式的当前可能性。我们的过程需要确保可重复性,并保证在数据预处理中不隐含任何信息。我们的方法能够正确地对潜在的字素进行分类,从而表明不仅在文本数据中确实存在音素属性,而且可以从原始unicode文本数据中自动检索它们并将其转换为音素表示。顺便说一下,我们观察到SOM算法可以很好地处理非常稀疏的向量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号