Unsupervised approaches for measuring textual similarity between legal court case reports

Arpan Mandal; Kripabandhu Ghosh; Saptarshi Ghosh; Sekhar Mandal

首页> 外文期刊>Artificial Intelligence and Law >Unsupervised approaches for measuring textual similarity between legal court case reports

【24h】

Unsupervised approaches for measuring textual similarity between legal court case reports

机译：无监督法律法院案件报告之间的文本相似性的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the domain of legal information retrieval, an important challenge is to compute similarity between two legal documents. Precedents (statements from prior cases) play an important role in The Common Law system, where lawyers need to frequently refer to relevant prior cases. Measuring document similarity is one of the most crucial aspects of any document retrieval system which decides the speed, scalability and accuracy of the system. Text-based and network-based methods for computing similarity among case reports have already been proposed in prior works but not without a few pitfalls. Since legal citation networks are generally highly disconnected, network based metrics are not suited for them. Till date, only a few text-based and predominant embedding based methods have been employed, for instance, TF-IDF based approaches, Word2Vec (Mikolov et al. 2013) and Doc2Vec (Le and Mikolov 2014) based approaches. We investigate the performance of 56 different methodologies for computing textual similarity across court case statements when applied on a dataset of Indian Supreme Court Cases. Among the 56 different methods, thirty are adaptations of existing methods and twenty-six are our proposed methods. The methods studied include models such as BERT (Devlin et al. 2018) and Law2Vec (Ilias 2019). It is observed that the more traditional methods (such as the TF-IDF and LDA) that rely on a bag-of-words representation performs better than the more advanced context-aware methods (like BERT and Law2Vec) for computing document-level similarity. Finally we nominate, via empirical validation, five of our best performing methods as appropriate for measuring similarity between case reports. Among these five, two are adaptations of existing methods and the other three are our proposed methods.

机译：在法律信息检索领域，重要的挑战是在两份法律文件之间计算相似之处。先决条例（先前情况的陈述）在普通法制度中发挥着重要作用，律师需要经常提到相关事先提出的情况。测量文档相似度是任何文档检索系统的最重要方面之一，它决定了系统的速度，可扩展性和准确性。基于文本和基于网络的用于计算相似性的方法，在之前的作用中已经提出，但没有几个陷阱，已经提出。由于法律引文网络通常高度断开，因此基于网络的指标不适合它们。截至日期，已采用少数基于文本和主要的嵌入的方法，例如基于TF-IDF的方法，Word2Vec（Mikolov等，2013）和Doc2Vec（Le和Mikolov 2014）的方法。我们在申请在印度最高法院案件的数据集时，调查56种不同方法，以便在法院案件陈述中计算文本相似性。在56种不同的方法中，三十种适应现有方法，二十六个是我们提出的方法。研究的方法包括伯特（Devlin等，2018）和Law2Vec（Ilias 2019）等模型。观察到依赖于单词袋式表示的传统方法（例如TF-IDF和LDA）比更高级的上下文感知方法（如BERT和LAM2VEC）更好地执行用于计算文档级相似性。最后，我们通过经验验证提名五种最佳表演方法，以适当地测量案例报告之间的相似性。其中五个，两个是现有方法的适应性，另外三种是我们所提出的方法。

著录项

来源
《Artificial Intelligence and Law》 |2021年第3期|417-451|共35页
作者
Arpan Mandal; Kripabandhu Ghosh; Saptarshi Ghosh; Sekhar Mandal;
展开▼
作者单位

Department of Computer Science and Technology Indian Institute of Engineering Science and Technology Howrah Shibpur India;

Department of Computational and Data Sciences (CDS) Indian Institutes of Science Education and Research Kolkata West Bengal India;

Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Kharagpur West Bengal India;

Department of Computer Science and Technology Indian Institute of Engineering Science and Technology Howrah Shibpur India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Legal information retrieval; Court case reports; Court case similarity; Topic modeling; Word2vec; Doc2vec; BERT; Law2vec;

机译：法律信息检索;法庭案件报告;法院案例相似;主题建模;word2vec;doc2vec;伯特;Law2Vec.;

相似文献

外文文献
中文文献
专利

1. Measures for textual patent similarities: a guided way to select appropriate approaches [J] . Martin G. Moehrle Scientometrics . 2010,第1期

机译：文本专利相似性的量度：选择适当方法的指导方式
2. Unsupervised classification: similarity measures, classical and metaheuristic approaches, and applications [J] . Nicola Di Mauro Computing reviews . 2013,第10期

机译：无监督分类：相似性度量，经典和元启发式方法及其应用
3. Unsupervised Similarity Learning from Textual Data [J] . Andrzej Janusz, Dominik Slezak, Hung Son Nguyen Fundamenta Informaticae . 2012,第3a4期

机译：从文本数据进行无监督的相似性学习
4. Regression Based Approaches for Detecting and Measuring Textual Similarity [C] . Sandip Sarkar, Partha Pakray, Dipankar Das, International conference on mining intelligence and knowledge exploration . 2017

机译：基于回归的文本相似性检测和测量方法
5. Learning from Expert: A Textual Similarity and Topic Study of Expanded Auditor’s Report in the United Kingdom [D] . Yau, Ling Na Belinda. 2019

机译：从专家学习：在英国扩大审计师报告的文本相似性和主题研究
6. The Case for Measuring Legal Actor Contributions in Court Proceedings [O] . Rhondda Waterworth 2019

机译：法院诉讼中衡量法人贡献的案例
7. nyk of the Lviv University. Series Law KEYWORDS abuse of authority, abuse of power, abuse of official status, abuse of office acts of the European Union, international legal regulation, employment, right to free movement advocacy, advocate activity, advocacy science, advocatologie, theory of advocacy appeal proceeding, grounds to judgement revision, inconsistency of the court’s findings at first instance with the actual circumstances of the criminal proceedings, cancellation or alteration of the judgment charity organization, founder, assets of charity organization, constituent documents criminal proceedings, subjects of criminal proceedings, the suspect, the suspect law, criminal procedure, international standards employer’s duty, right to the moral injury compensation, social insurance from an industrial accident, social need, labour dispute forms of the legal actions of the collective of employees historical and legal science department, scientific activity law enforcement equipment, individual legal act, the means of forming the content of the enforcement act, requirements for registration of individual legal act attributes (properties) of acts of law legal formula (construction), qualified corpus delicti of a crime, degree of social danger, crime-forming feature legal social community legal technique, technology, legal act, legal system, lawmaking legitimacy, investigation of crime, concept of criminalistics, criminalistics recommendations, tactical methods measures aimed at providing criminal proceedings, procedural sanction, monetary penalty, pre-trial investigation national implementation, forms of implementation, implementation practice, European states, international treaties participant, shareholder, partnership, acquisition, changing, suspension proof, probability, likelihood, credibility in evidence, reliability scientific school, research, development land, agricultural and environmental law, Lviv Scientific School land, agricultural and environmental law the High Council of Justice of Ukraine, the National Council of Justice of the Republic of Poland, judges’ independence, international standards of the judiciary the presumption of the labor legal personality OPEN JOURNAL SYSTEMS Journal Help USER Username Password Remember me Login NOTIFICATIONS View Subscribe LANGUAGE Select LanguageSubmit JOURNAL CONTENT Search Search Scope Search Browse By Issue By Author By Title Other Journals FONT SIZE Make font size smallerMake font size defaultMake font size larger INFORMATION For Readers For Authors For Librarians HOME ABOUT LOGIN REGISTER SEARCH CURRENT ARCHIVES ANNOUNCEMENTS Home > No 67 (2018) > Марін ONCE AGAIN ON THE RETROACTIVE EFFECT OF THE CRIMINAL LAW IN TIME IN THE ASPECT OF INDIRECT CRIMINALIZATION [O] . Oleksandr Marin 2018

机译：LVIV大学的尼克。系列律师关键词滥用权威，滥用权力，滥用官方地位，滥用欧盟办公室行为，国际法律监管，就业，自由运动倡导，倡导活动，倡导科学，倡导性宣传理论，倡导呼吁的理论，理由修改，法院调查结果的不一致事实，判决慈善组织的实际情况，取消或改变判决慈善机构，慈善组织的创始人，慈善机构资产，组成文件刑事诉讼，刑事诉讼主题，嫌疑人，嫌疑法，刑事诉讼，国际标准雇主的责任，道德伤害赔偿权，社会保险从工业事故，社会需求，劳动争端形式的员工历史和法律科学部的法律行为，科学活动执法设备，个人法律法案，制定执法法案的内容，法律法律公式（建设）行为的个人法律法案（属性），犯罪的合格犯罪，社会危险程度，犯罪形成特征法律社会社会法律技术，技术，法律法，法律制度，立法合法性，犯罪调查，犯罪概念，犯罪建议，战术方法旨在提供刑事诉讼，程序制裁，货币罚款，预审调查预审国家实施，执行形式，实施实践，欧洲国家，国际条约参与者，股东，伙伴关系，收购，不断变化，暂停证明，概率，可能性，可信度在证据，可靠性科学学校，研究，开发土地，农业和环境法，利沃科学校园，农业和环境法律议理乌克兰的冰，波兰共和国国家司法委员会，法官的独立，国际标准的司法部门劳动法人的推定开放期刊系统期刊帮助用户用户名密码记住我登录通知查看订阅语言选择lobsumberubmit期刊内容搜索搜索范围搜索按问题浏览作者标题在间接刑事定罪方面，再次对刑法的追溯效应及时

Unsupervised approaches for measuring textual similarity between legal court case reports

摘要

著录项

相似文献

相关主题

期刊订阅