...
首页> 外文期刊>Explorations in economic history >Combining family history and machine learning to link historical records: The Census Tree data set
【24h】

Combining family history and machine learning to link historical records: The Census Tree data set

机译:组合家族史和机器学习链接历史记录:人口普查树数据集

获取原文
获取原文并翻译 | 示例
           

摘要

A key challenge for research on many questions in the social sciences is that it is difficult to link records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we contribute to recent efforts to create these links with a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. We use these ?true ? links both to inform the decisions one needs to make when using automated methods to link records and as a training data set for use in a supervised machine learning approach. We describe our procedure and illustrate its potential by linking individuals across the 100% samples of the US censuses from 1900, 1910, and 1920. When linking adjacent censuses, we obtain an overall match rate of 62-65 percent (for over 88.9 million matches), with a false positive rate that is around 6-7 percent and with links that are similar to the population along observable characteristics. Thus, our method allows us to link records with a combination of a high match rate, precision, and representativeness that is beyond the current frontier. Finally, we demonstrate the potential of the data by estimating the degree of intergenerational transmission of literacy between father-son and mother-daughter pairs.
机译:对在社会科学的许多问题研究的一个关键挑战是,它是很难的链接记录的方式,允许调查人员在他们的生活或跨世代的不同点观察的人。在本文中,我们促成最近努力创建依赖于数百万个人撰稿者创建一个大的,公共的,维基式的家谱记录链接的一种新的方法,这些链接。我们使用这些?真的吗?链接都通知决定一个需要使用自动化的方法来链接记录,并作为训练数据集用于在监督机器学习方法使用时进行。我们描述我们的程序,并通过1900年,1910年,到1920年连接在美国人口普查的100点%的样本个人说明其潜在的链接时相邻的普查,我们得到的百分之62-65(总匹配率超过8890万的比赛),用假阳性率大约为6%-7%,并用类似于沿观测特征的人群的联系。因此,我们的方法使我们能够与高匹配率,精确度和代表性,超出当前前沿的组合链接记录。最后,我们通过估计父子和母女之间对识字的代际传递的程度展示数据的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号