Using Anticlustering to Partition Data Sets Into Equivalent Parts

Papenberg Martin; Klau Gunnar W.

首页> 外文期刊>Psychological Methods >Using Anticlustering to Partition Data Sets Into Equivalent Parts

【24h】

Using Anticlustering to Partition Data Sets Into Equivalent Parts

机译：使用反群集将数据集划分为等效零件

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Numerous applications in psychological research require that a pool of elements is partitioned into multiple parts. While many applications seek groups that are well-separated, that is, dissimilar from each other, others require the different groups to be as similar as possible. Examples include the assignment of students to parallel courses, assembling stimulus sets in experimental psychology, splitting achievement tests into parts of equal difficulty, and dividing a data set for cross-validation. We present anticlust, an easy-to-use and free software package for solving these problems fast and in an automated manner. The package anticlust is an open source extension to the R programming language and implements the methodology of anticlustering. Anticlustering divides elements into similar parts, ensuring similarity between groups by enforcing heterogeneity within groups. Thus, anticlustering is the direct reversal of cluster analysis that aims to maximize homogeneity within groups and dissimilarity between groups. Our package anticlust implements 2 anticlustering criteria, reversing the clustering methods k-means and cluster editing, respectively. In a simulation study, we show that anticlustering returns excellent results and outperforms alternative approaches like random assignment and matching. In 3 example applications, we illustrate how to apply anticlust on real data sets. We demonstrate how to assign experimental stimuli to equivalent sets based on norming data, how to divide a large data set for cross-validation, and how to split a test into parts of equal item difficulty and discrimination.

机译：心理学研究中的许多应用要求将一组元素分为多个部分。尽管许多应用程序都寻求分离良好的群体，也就是说，彼此不同，但其他应用程序要求不同的群体尽可能相似。示例包括将学生分配到平行课程中，在实验心理学中组装刺激集，将成就测试分为相等难度的一部分，并将数据集划分用于交叉验证。我们提出Anticlust，这是一个易于使用和免费的软件包，可快速和自动化的方式解决这些问题。软件包是对R编程语言的开源扩展，并实现了反群集的方法。反群集将元素划分为相似的部分，从而通过在组内执行异质性来确保组之间的相似性。因此，抗簇是聚类分析的直接逆转，旨在最大程度地提高群体内部的同质性和两组之间的相似性。我们的包装副套件分别实现了2个反聚类标准，分别逆转聚类方法K-均值和聚类编辑。在一项仿真研究中，我们表明，抗群集会回报出色的结果，并优于随机分配和匹配等替代方法。在3个示例应用程序中，我们说明了如何在真实数据集上应用反lust。我们演示了如何根据规范数据将实验刺激分配给等效集，如何将大型数据集分配以进行交叉验证以及如何将测试分为相等的项目难度和歧视的一部分。

著录项

来源
《Psychological Methods》 |2021年第2期|161-+|共15页
作者
Papenberg Martin; Klau Gunnar W.;
展开▼
作者单位

Heinrich Heine Univ Dusseldorf, Dept Expt Psychol, Univ Str 1, D-40225 Dusseldorf, Germany;

Heinrich Heine Univ Dusseldorf, Dept Comp Sci, Dusseldorf, Germany;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Stimuli; cross-validation; RandomisationencapsulatingDifficulty Levelbehavioral researchpartitionsSoftware packagesEQUIPMENTPARTDataset;

机译：刺激;交叉验证;RancomatisationCapsulatingDifficulty levelbehavioral ResearchPartitionsoftware packagsequipmentPartdataset;

相似文献

外文文献
中文文献
专利

1. S-data: Sber a zpracovani provoznich dat on line ve slevarne BENES a LAT a.s. [J] . Jan Zalsky, Jakub Benes, Milan Lunak Slevarenstvi . 2019,第12期

机译：S-DATA：Slevarne Benes和Lat A.S的Sys和Produce on Data Data。
2. COPA-DATA UK and Ireland is a subsidiary of COPA-DATA GmbH based in Salzburg, Austria: COPA-DATA is the innovation leader for HMI/SCADA software [J] . Mary Murphy Machinery Update . 2009,第5期

机译：COPA-DATA UK和爱尔兰是位于奥地利萨尔茨堡的COPA-DATA GmbH的子公司：COPA-DATA是HMI / SCADA软件的创新领导者
3. Transgenerational Epigenetic Inheritance Is Revealed as a Multi-stepProcess by Studies of the SET-Domain Proteins SET-25 and SET-32 [J] . Rachel M Woodhouse, Alyson Ashe Genetics & Epigenetics . 2019,第7期

机译：通过SET域蛋白SET-25和SET-32的研究，揭示了跨代表观遗传遗传的多步骤过程。
4. Probabilistic Shaping of Set-Partition mQAM [C] . Inwoong Kim, Olga Vassilieva, Paparao Palacharla, Optical Fiber Communications Conference and Exhibition . 2019

机译：Set-Partition MQAM的概率整形
5. The Roles of Set-9 and Set-26 in Longevity, Germline Function and RNAi Pathway [D] . Wang, Wenke. 2018

机译：Set-9和Set-26在长寿，生殖细胞功能和RNAi途径中的作用
6. Total Energy Intake and Intake of Three Major Nutrients by Body Mass Index in Japan: NIPPON DATA80 and NIPPON DATA90 [O] . Katsushi Yoshita, Yusuke Arai, Miho Nozue, 2010

机译：日本的人体总质量指数和三种主要营养素的总能量摄入量：NIPPON DATA80和NIPPON DATA90
7. Set-partition tableaux and representations of diagram algebras [O] . Tom Halverson, Theodore N. Jacobson 2020

机译：Set-Partition TableAux和图形代数的表示
8. Maps of Northern Indiana Showing Thickness of the Sunbury, Ellsworth, and Antrim Shales (New Albany Shale Equivalents). [R] . Hasenmueller, N. R., Bassett, J. L. 1979

机译：印第安纳州北部的地图显示了sunbury，Ellsworth和antrim页岩的厚度（New albany shale Equivalents）。

Using Anticlustering to Partition Data Sets Into Equivalent Parts

摘要

著录项

相似文献

相关主题

期刊订阅