首页> 外文会议>International Joint Conference on Neural Networks >Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service
【24h】

Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service

机译:法律文件分类:适用于公诉机关请愿书的法律领域预测

获取原文

摘要

In recent years, there has been an increased interest in the application of Natural Language Processing (NLP) to legal documents. The use of convolutional and recurrent neural networks along with word embedding techniques have presented promising results when applied to textual classification problems, such as sentiment analysis and topic segmentation of documents. This paper proposes the use of NLP techniques for textual classification, with the purpose of categorizing the descriptions of the services provided by the Public Prosecutor’s Office of the State of Paraná to the population in one of the areas of law covered by the institution. Our main goal is to automate the process of assigning petitions to their respective areas of law, with a consequent reduction in costs and time associated with such process while allowing the allocation of human resources to more complex tasks. In this paper, we compare different approaches to word representations in the aforementioned task: including document-term matrices and a few different word embeddings. With regards to the classification models, we evaluated three dif-ferent families: linear models, boosted trees and neural networks. The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network (RNN) architecture (more specifically, LSTM), leading to an accuracy of 90% and F1-Score of 85% in the classification of eighteen categories (law areas).
机译:近年来,人们越来越关注将自然语言处理(NLP)应用于法律文件。当将卷积神经网络和递归神经网络与词嵌入技术结合使用时,在将其应用于文本分类问题(例如情感分析和文档主题分割)时,已显示出令人鼓舞的结果。本文建议使用NLP技术进行文本分类,目的是将该机构在法律所涵盖的法律领域之一中对巴拉那州检察官办公室向民众提供的服务描述进行分类。我们的主要目标是使将请愿书分配到各自法律领域的过程自动化,从而减少与该过程相关的成本和时间,同时允许将人力资源分配给更复杂的任务。在本文中,我们比较了上述任务中单词表示的不同方法:包括文档术语矩阵和一些不同的单词嵌入。关于分类模型,我们评估了三个不同的族:线性模型,增强树和神经网络。结合在特定领域语料库上训练的Word2Vec和递归神经网络(RNN)架构(更确切地说是LSTM)的组合,可以获得最佳结果,从而在分类中达到90%的准确性和F1-Score的85%十八个类别(法律领域)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号