Developing Machine Learning Framework to Classify Harmonized System Code. Case Study: Indonesian Customs

机译：开发机器学习框架来分类协调系统代码。案例研究：印度尼西亚海关

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Directorate General of Customs and Excise (DGCE), an Indonesian Government agency under the Ministry of Finance, is responsible for ensuring importer or exporter classify their declared goods based on the Harmonized System Code (HS Code). This study aims to find an optimal machine learning framework to classify goods into their HS Code based on the challenges DGCE faced, such as mixed language with an inconsistent pattern of goods descriptions, imbalance multiclass HS Code, and some additional categorical variables. Refer to some previous studies that propose some machine learning models to predict the HS Code based on goods descriptions. This study tries to make some improvements and adjustments in line with the previously mentioned challenges faced by DGCE. Some preprocessing tasks were performed, such as dealing with abbreviations, misspellings, the varying pattern of goods description, and translating Indonesian words into English. One Hot Coding (OHC) is applied to encode nominal and categorical variables. To make features from goods descriptions, we choose Term Frequency - Inverse Document Frequency (TF-IDF) combined with bigrams. As a result, our models show that Random Forest got an F1-score of 79.60% when classifying the HS Code's first four digits, and Multinomial NB got an F1-score of 72.74% when classifying the HS Code's entire digits. Compared to the baseline paper, those scores are 11.26% and 11.36% higher, respectively.

机译：海关总署（DGCE）是金融部的印度尼西亚政府机构，负责确保进口商或出口商根据协调的制度代码（HS代码）宣布其宣布的货物。本研究旨在找到最佳的机器学习框架，以基于所面临的挑战，例如具有不一致的商品描述模式，不平衡的多字符HS代码和一些附加分类变量的混合语言来将商品分类为他们的HS代码。请参阅以前的一些研究，提出了一些机器学习模型，以基于商品描述预测HS代码。本研究试图根据DGCE面临的前面提到的挑战进行一些改进和调整。进行了一些预处理任务，例如处理缩写，拼写错误，商品描述的不同模式，并将印度尼西亚语翻译成英文。应用一个热编码（OHC）以编码标称和分类变量。要从商品描述中进行功能，我们选择术语频率 - 逆文档频率（TF-IDF）与Bigrams相结合。因此，我们的模型显示随机森林在分类HS代码的前四位数时获得了79.60％的F1分数，并且在分类HS代码的整个数字时，多项式NB在72.74％的F1分数为72.74％。与基线纸相比，这些评分分别为11.26％和11.36％。

著录项

来源
《East Indonesia Conference on Computer and Information Technology》|2021年|254-259|共6页
会议地点
作者
I Gede Yudi Paramartha; Igi Ardiyanto; Risanuri Hidayat;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Government; Finance; Predictive models; Encoding; Task analysis; Information technology; Random forests;

机译：政府;财务;预测模型;编码;任务分析;信息技术;随机森林;

相似文献

外文文献
中文文献
专利

1. Framework for Customized, Machine Learning Driven Condition Monitoring System for Manufacturing [J] . Marcin Hinz, Dominik Brueggemann, Stefan Bracke Procedia Manufacturing . 2019,第186期

机译：定制，机器学习驱动条件监测系统的框架
2. An Approach to Develop Expert Systems in Medical Diagnosis Using Machine Learning Algorithms (Asthma) and A Performance Study [J] . BDCN Prasadl, P. E. S. N Krishna Prasad, Y Sagar International Journal on Soft Computing . 2011,第1期

机译：利用机器学习算法（哮喘）开发医学诊断专家系统的方法和性能研究
3. Developing Digital Dashboard Management for Learning System Dynamic Cooperative Simulation Behavior of Indonesia - (Study on Cooperative Information Organization in the Ministry of Cooperatives and SME) [J] . Yuli Eni, Rudy Aryanto EPJ Web of Conferences . 2014,第12期

机译：开发用于印度尼西亚学习系统动态合作模拟行为的数字仪表盘管理-（合作社和中小企业部合作信息组织研究）
4. Framework for Customized, Machine Learning Driven Condition Monitoring System for Manufacturing [C] . Marcin Hinz, Dominik Brueggemann, Stefan Bracke International Conference on Production Research . 2021

机译：定制，机器学习驱动条件监控系统的框架
5. Performance Management Systems based on the Balanced Scorecard Framework: The Case of Indonesian Customs and Excise Organizations [D] . Simbolon, Saut Mulia. 2018

机译：基于平衡计分卡框架的性能管理系统：印度尼西亚海关组织的案例
6. A machine learning based framework to identify and classify long terminal repeat retrotransposons [O] . Leander Schietgat, Celine Vens, Ricardo Cerri, 2018

机译：基于机器学习的框架用于识别和分类长末端重复逆转座子
7. AN APPROACH TO DEVELOP EXPERT SYSTEMS IN MEDICAL DIAGNOSIS USING MACHINE LEARNING ALGORITHMS (ASTHMA) AND A PERFORMANCE STUDY [O] . Bdcn Prasadl, Y Sagar 2011

机译：利用机器学习算法（asTHma）和性能研究开发医学诊断专家系统的方法

Developing Machine Learning Framework to Classify Harmonized System Code. Case Study: Indonesian Customs

摘要

著录项

相似文献

相关主题

期刊订阅