Biased Embeddings from Wild Data: Measuring, Understanding and Removing

机译：来自野生数据的有偏嵌入：测量，理解和删除

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.

机译：许多现代人工智能（AI）系统都利用数据嵌入，特别是在自然语言处理（NLP）领域。这些嵌入是从“从野外”收集的数据中获悉的，并且发现它们包含不想要的偏差。在本文中，我们对测量，理解和消除此问题做出了三点贡献。我们使用针对社会心理学应用程序创建的单词表，提出了一种严格的方法来衡量其中一些偏见;我们观察到职业中的性别偏见如何反映现实世界中相同职业中的实际性别偏见;最后，我们演示了简单的投影如何显着降低嵌入偏差的影响。所有这些都是正在进行的努力的一部分，以了解如何将信任构建到AI系统中。

著录项

来源
《International symposium on intelligent data analysis》|2017年|328-339|共12页
会议地点
作者
Adam Sutton; Thomas Lansdall-Welfare; Nello Cristianini;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Fairness in AI; Bias in data; Artificial intelligence; Natural language processing; Word embeddings;

机译：人工智能的公平;数据偏差;人工智能;自然语言处理;词嵌入;

相似文献

外文文献
中文文献
专利

1. Investigation of the geothermal state of sedimentary basins using oil industry thermal data: Case study from Northern Alberta exhibiting the need to systematically remove biased data [J] . Allan Gray D., Majorowicz J., Unsworth M. Journal of geophysics and engineering . 2012,第5期

机译：利用石油工业热数据研究沉积盆地的地热状态：来自北艾伯塔省的案例研究表明需要系统地去除偏差数据
2. Dr Charlton et al., Commentary and Complementary Data to Add to "Compliance with Cancer Quality Measures Over Time and Their Association with Survival Outcomes: The Commission on Cancer's Experience with the Quality Measure Requiring at Least 12 Regional Lymph Nodes to be Removed and Analyzed with Colon Cancer Resections" [J] . Shulman Lawrence N. Annals of surgical oncology . 2020,第4期

机译：Charlton等人博士，评论和补充数据，增加“随着时间的推移遵守癌症质量措施及其与生存结果：癌症的经验委员会的质量措施需要至少删除和分析4个区域淋巴结的质量措施和分析结肠癌切除术“
3. Commentary and Complementary Data to Add to "Compliance with Cancer Quality Measures Over Time and Their Association with Survival Outcomes: The Commission on Cancer's Experience with the Quality Measure Requiring at Least 12 Regional Lymph Nodes to be Removed and Analyzed with Colon Cancer Resections" [J] . Charlton Mary, Kahl Amanda, Gao Xiang, Annals of surgical oncology . 2020,第4期

机译：评注和补充数据增加“随着时间的推移遵守癌症质量措施及其与生存结果：癌症委员会的经验，要求除去至少12个区域淋巴结的质量措施并用结肠癌切除分析”
4. Biased Embeddings from Wild Data: Measuring, Understanding and Removing [C] . Adam Sutton, Thomas Lansdall-Welfare, Nello Cristianini International Symposium on Intelligent Data Analysis . 2018

机译：来自野生数据的偏见嵌入：测量，理解和去除
5. Effect of Removing the Wild-Type VHL Tumor Suppressor Gene using CRISPR-Cas 9 in PC12 Neuroendocrine Cells [D] . Perrotto, David 2017

机译：使用CRISPR-Cas 9去除PC12神经内分泌细胞中野生型VHL肿瘤抑制基因的作用
6. Design Implementation and Data Analysis of an Embedded System for Measuring Environmental Quantities [O] . Martin Pieš, Radovan Hájovský, Jan Velička 2020

机译：嵌入式环境量测量系统的设计实现与数据分析

Biased Embeddings from Wild Data: Measuring, Understanding and Removing

摘要

著录项

相似文献

相关主题

期刊订阅