首页> 外国专利> SYSTEM FOR PROVIDING ARTIFICIAL INTELLIGENCE BASED DIALOGUE TYPE CORPUS ANALYZE SERVICE AND BUILDING METHOD THEREFOR

SYSTEM FOR PROVIDING ARTIFICIAL INTELLIGENCE BASED DIALOGUE TYPE CORPUS ANALYZE SERVICE AND BUILDING METHOD THEREFOR

机译:基于人工智能的对话型语料库分析服务系统及构建方法

摘要

Provided is an interactive corpus analysis service providing system for constructing large corpus of machine learning based on AI voice recognition. The system comprises: a collection device collecting contents including colloquial sentences and dialog sentences through at least one medium, and collecting scripts transferred from a shorthand terminal; and an interactive corpus analysis service providing server including a generating unit configured to extract the colloquial sentences and the dialog sentences from the contents and scripts collected by the collection device and generate raw corpus data including at least one property, a refining unit configured to normalize, restore and refine a conversational pair of the raw corpus data, a distinguishing unit configured to tag a profile by distinguishing speakers of sentences included in the refined conversation pair, a restoring unit configured to restore a subject when the subject does not exist in the form of the sentence in which the profile is tagged, a protection unit configured to recognize and replace sensitive information in personal information included in the sentence in which the subject is restored, and a tagging unit configured to construct an analysis corpus by performing uttering tagging based on at least one uttering attribute information in the sentence in which the sensitive information is replaced.
机译:提供了一种交互式语料库分析服务提供系统,用于基于AI语音识别构建大型机器学习语料库。该系统包括:收集设备,其通过至少一种介质收集包括口语句子和对话句子的​​内容,以及收集从速记终端传送的脚本。交互式语料库分析服务提供服务器,其包括:生成单元,被配置为从收集装置收集的内容和脚本中提取口语句子和对话句子,并生成包括至少一个属性的原始语料数据;恢复和细化原始语料库数据的对话对,识别单元,其配置为通过区分包括在提炼的对话对中的句子的说话者来标记简档,还原单元,被配置为当主题不存在时恢复主题标记了个人资料的句子,配置为识别和替换包含在恢复主题的句子中的个人信息中的敏感信息的保护单元,以及标记单元,其配置为通过基于以下内容执行发声标记来构建分析语料库句子中至少一个发声的属性信息,其中敏感信息被替换。

著录项

  • 公开/公告号KR102041621B1

    专利类型

  • 公开/公告日2019-11-06

    原文格式PDF

  • 申请/专利权人 MEDIA CORPUS INC.;

    申请/专利号KR20190022012

  • 发明设计人 BAE SANG HEE;

    申请日2019-02-25

  • 分类号G10L15/18;G10L15/04;G10L15/16;G10L17/22;

  • 国家 KR

  • 入库时间 2022-08-21 11:47:28

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号