(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas

机译：（男，学士）和（女，博士学位）具有不同的含义：具有多个角色的平行注释的风格语言数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stylistic variation in text needs to be studied with different aspects including the writer's personal traits, interpersonal relations, rhetoric, and more. Despite recent attempts on computational modeling of the variation, the lack of parallel corpora of style language makes it difficult to systematically control the stylistic change as well as evaluate such models. We release PASTEL, the parallel and annotated stylistic language dataset. that contains ≈ 41K parallel sentences (8.3K parallel stories) annotated across different personas. Each persona has different styles in conjunction: gender, age, country, political view, education, ethnic, and time-of-writing. The dataset is collected from human annotators with solid control of input denotation: not only preserving original meaning between text, but promoting stylistic diversity to annotators. We test the dataset on two interesting applications of style language, where PASTEL helps design appropriate experiment and evaluation. First, in predicting a target style (e.g., male or female in gender) given a text, multiple styles of PASTEL make other external style variables controlled (or fixed), which is a more accurate experimental design. Second, a simple supervised model with our parallel text outperforms the unsupervised models using non-parallel text in style transfer. Our dataset is publicly available~1.

机译：需要从不同方面研究文本的风格变化，包括作者的个人特征，人际关系，修辞等等。尽管最近对变体的计算建模进行了尝试，但是缺乏样式语言的并行语料库使得难以系统地控制样式变化以及评估此类模型。我们发布了PASTEL，这是一种并行且带注释的风格语言数据集。包含≈41K平行句子（8.3K平行故事），并在不同角色间进行了注释。每个角色在风格上都有不同的风格：性别，年龄，国家，政治观点，教育程度，种族和写作时间。该数据集是从人类注释者那里收集的，并可靠地控制了输入注释：不仅保留了文本之间的原始含义，而且还促进了注释者的风格多样性。我们在两种有趣的样式语言应用程序上测试数据集，其中PASTEL帮助设计适当的实验和评估。首先，在预测给定文本的目标样式（例如性别中的男性或女性）时，PASTEL的多种样式可以控制（或固定）其他外部样式变量，这是一种更为准确的实验设计。其次，带有并行文本的简单监督模型在样式转换中优于使用非并行文本的无监督模型。我们的数据集是公开可用的〜1。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|1696-1706|共11页
会议地点
作者
Dongyeop Kang; Varun Gangal; Eduard Hovy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. 具有多个不同质量资源网络的设计与目标值确定 [J] . 刘智勇, 李艳梅, 张广林, 中国化学工程学报（英文版） . 2009,第003期
2. Face recognition from multiple stylistic sketches: Scenarios, datasets, and evaluation [J] . Peng Chunlei, Gao Xinbo, Wang Nannan, Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：从多个风格素描的人脸识别：场景，数据集和评估
3. SenseDefs: a multilingual corpus of semantically annotated textual definitions Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking [J] . Camacho-Collados Jose, Bovi Claudio Delli, Raganato Alessandro, Language Resources and Evaluation . 2019,第2期

机译：SenseDefs：带有语义注释的文本定义的多语言语料库共同开发多种语言和资源，以实现高质量的词义消歧和实体链接
4. SenseDefs: a multilingual corpus of semantically annotated textual definitions Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking [J] . Camacho-Collados Jose, Bovi Claudio Delli, Raganato Alessandro, Language Resources and Evaluation . 2019,第2期

机译：SenseDefs：一种多语言语料库的语义注释的文本定义，利用多种语言和资源，共同用于高质量的单词感应消歧和实体链接
5. (Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas [C] . Dongyeop Kang, Varun Gangal, Eduard Hovy International joint conference on natural language processing . 2019

机译：（男性，学士学位）和（女性，博士）具有不同的内涵：并行注释的传感器语言数据集，具有多个角色
6. Parallel Feature Selection of Multiple Class Datasets Using Apache Spark [D] . Sankineni, Rishi 2017

机译：使用Apache Spark的多个类数据集的并行特征选择
7. Are females more variable than males in gene expression? Meta-analysis of microarray datasets [O] . Yuichiro Itoh, Arthur P. Arnold 2015

机译：女性在基因表达方面是否比男性更多？芯片数据集的荟萃分析
8. (Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas [O] . Dongyeop Kang, Varun Gangal, Eduard Hovy 2019

机译：（男性，学士学位）和（女性，博士）具有不同的内涵：并行注释的传感器语言数据集，具有多个角色

(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas

摘要

著录项

相似文献

相关主题

期刊订阅