首页> 外文期刊>Records management journal >Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data
【24h】

Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data

机译:探索结构化和非结构化数字数据评估自动化的算法方法

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchatel (Office des archives de l'Etat de Neuchatel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process. Design/methodology/approach - Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert's feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution. Findings - The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert's consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising. Originality/value - This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.
机译:目的 - 本文旨在描述瑞士在日内瓦商业管理学院进行的跨学科和创新研究,由Neuchatel国家档案馆(办公室Des Archives de Neuchatel,Oaen)。要解决的问题是最古典的问题:如何以大量多样化和复杂的数据记录格式和内容提取和区分相关数据。本研究的目标是提供一个框架和一个软件的概念证明,有助于对oaen提出的记录和数据的保留和处置的可辩护决策。为此目的,作者设计了两个轴:档案轴,为结构化和非结构化数据的评估以及数据挖掘轴来提出归档度量,以提出算法方法作为评估过程的互补或/和附加指标。设计/方法/方法 - 基于两个轴,这个探索性研究设计和测试与数据挖掘指标配对的档案指标的可行性,以便尽可能地以系统甚至自动方式推进数字评估过程。在Axis 1下,作者已经开始了三个步骤:第一,设计了一个概念框架,以记录数据评估的详细三维方法(可靠性,可利用性,代表性)。此外,作者还定义了指导概念维度的运作的主要原则和假设。其次,运行所提出的指标,以通过定量方法支持的变量来表达,用于其测量和评分。第三,作者分享了这种概念框架,提出了具有经验丰富的专业人员的尺寸和运营变量(指标)以验证它们。专家的反馈终于给了作者一个想法:这些指标的相关性和可行性。这两个方面可以证明在现实生活中的档案实践中这种方法的可接受性。并行地,轴2提出了不仅覆盖数据的宏分析的功能,还提出了能够计算数字档案和数据挖掘度量的算法方法。基于此,提出了三种用例来想象符合诸如应用这种解决方案的合理和说明性场景。调查结果 - 主要结果表明,用可重复的方法测量数据值和记录的可行性。更具体地说,对于轴1,作者以灵活和模块化的方式应用了指标。作者还定义了能够实现计算评分方法所需的主要原则。通过专家对42个指标的相关性获得的结果表明接受率超过80%。此外,结果表明,所有度量的60%都可以自动化。关于轴线2,33个功能开发并提出六种主要类型:宏观分析,微观分析,统计,检索,管理,最后,决策建模和机器学习。度量和功能的相关性是基于其方法的理论有效性和计算特征。这些结果在很大程度上令人满意和有前景。原创性/值 - 本研究提供了有价值的援助,以提高档案评估过程和决策的有效性和性能。可以考虑这些档案和数据挖掘度量的可转换性和适用性进行其他类型的数据。可以在研究数据,医疗数据或银行数据上测试这种方法及其度量的调整。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号