首页> 外文期刊>Artificial Intelligence and Law >Unsupervised approaches for measuring textual similarity between legal court case reports
【24h】

Unsupervised approaches for measuring textual similarity between legal court case reports

机译:无监督法律法院案件报告之间的文本相似性的方法

获取原文
获取原文并翻译 | 示例
           

摘要

In the domain of legal information retrieval, an important challenge is to compute similarity between two legal documents. Precedents (statements from prior cases) play an important role in The Common Law system, where lawyers need to frequently refer to relevant prior cases. Measuring document similarity is one of the most crucial aspects of any document retrieval system which decides the speed, scalability and accuracy of the system. Text-based and network-based methods for computing similarity among case reports have already been proposed in prior works but not without a few pitfalls. Since legal citation networks are generally highly disconnected, network based metrics are not suited for them. Till date, only a few text-based and predominant embedding based methods have been employed, for instance, TF-IDF based approaches, Word2Vec (Mikolov et al. 2013) and Doc2Vec (Le and Mikolov 2014) based approaches. We investigate the performance of 56 different methodologies for computing textual similarity across court case statements when applied on a dataset of Indian Supreme Court Cases. Among the 56 different methods, thirty are adaptations of existing methods and twenty-six are our proposed methods. The methods studied include models such as BERT (Devlin et al. 2018) and Law2Vec (Ilias 2019). It is observed that the more traditional methods (such as the TF-IDF and LDA) that rely on a bag-of-words representation performs better than the more advanced context-aware methods (like BERT and Law2Vec) for computing document-level similarity. Finally we nominate, via empirical validation, five of our best performing methods as appropriate for measuring similarity between case reports. Among these five, two are adaptations of existing methods and the other three are our proposed methods.
机译:在法律信息检索领域,重要的挑战是在两份法律文件之间计算相似之处。先决条例(先前情况的陈述)在普通法制度中发挥着重要作用,律师需要经常提到相关事先提出的​​情况。测量文档相似度是任何文档检索系统的最重要方面之一,它决定了系统的速度,可扩展性和准确性。基于文本和基于网络的用于计算相似性的方法,在之前的作用中已经提出,但没有几个陷阱,已经提出。由于法律引文网络通常高度断开,因此基于网络的指标不适合它们。截至日期,已采用少数基于文本和主要的嵌入的方法,例如基于TF-IDF的方法,Word2Vec(Mikolov等,2013)和Doc2Vec(Le和Mikolov 2014)的方法。我们在申请在印度最高法院案件的数据集时,调查56种不同方法,以便在法院案件陈述中计算文本相似性。在56种不同的方法中,三十种适应现有方法,二十六个是我们提出的方法。研究的方法包括伯特(Devlin等,2018)和Law2Vec(Ilias 2019)等模型。观察到依赖于单词袋式表示的传统方法(例如TF-IDF和LDA)比更高级的上下文感知方法(如BERT和LAM2VEC)更好地执行用于计算文档级相似性。最后,我们通过经验验证提名五种最佳表演方法,以适当地测量案例报告之间的相似性。其中五个,两个是现有方法的适应性,另外三种是我们所提出的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号