首页> 外文会议>World Conference on Information Systems and Technologies >Gender Classification of Twitter Data Based on Textual Meta-Attributes Extraction
【24h】

Gender Classification of Twitter Data Based on Textual Meta-Attributes Extraction

机译:基于文本元属性提取的Twitter数据的性别分类

获取原文

摘要

With the growth of social media in recent years, there has been an increasing interest in the automatic characterization of users based on the informal content they generate. In this context, the labeling of users in demographic categories, such as age, ethnicity, origin and race, among the investigation of other attributes inherent to users, such as political preferences, personality and gender expression, has received a great deal of attention, especially based on Twitter data. The present paper focuses on the task of gender classification by using 60 textual meta-attributes, commonly used on text attribution tasks, for the extraction of gender expression linguistic cues in tweets written in Portuguese. Therefore, taking into account characters, syntax, words, structure and morphology of short length, multi-genre, content free texts posted on Twitter to classify author's gender via three different machine-learning algorithms as well as evaluate the influence of the proposed meta-attributes in this process.
机译:随着近年来社交媒体的增长,基于他们生成的非正式内容,对用户自动表征的兴趣日益增长。在这方面,人口统计类别的标签,如年龄,种族,起源和种族,在对用户固有的其他属性的调查中,例如政治偏好,人格和性别表达,都得到了大量的关注,特别是基于推特数据。本文通过使用普遍用于文本归属任务的60个文本元属性,为在葡萄牙语中撰写的推文中提取性别表达语言线索的提取来侧重于性别分类的任务。因此,考虑到短长度的字符,语法,单词,结构和形态,多类型,内容免费文本发布在Twitter上,通过三个不同的机器学习算法对作者的性别进行分类,并评估所提出的元的影响在此过程中的属性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号