首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Court Stenography-To-Text ('STT') in Hong Kong: A Jurilinguistic Engineering Effort
【24h】

Court Stenography-To-Text ('STT') in Hong Kong: A Jurilinguistic Engineering Effort

机译:香港法院速记文本:法学工程的努力

获取原文
获取原文并翻译 | 示例
           

摘要

Implementation of legal bilingualism in Hong Kong after 1997 has necessitated the production of voluminous and extensive court proceedings and judgments in both Chinese and English. For the Chinese records, Cantonese, a dialect of Chinese, is the home language of more than 90% of the population in Hong Kong and is thus officially used in the courts. For the court proceedings, Cantonese speech would have to be recorded, and a Cantonese Computer-Aided Transcription system has been developed. The transcription system converts stenographic codes into Chinese text, i.e. from phonetic to orthographic representation of the language. The main challenge lies in the resolution of the severe ambiguity resulting from homocode problems in the conversion process. Cantonese Chinese is typified by problematic homonymy, which presents serious challenges. The N-gram statistical model is employed to estimate the most probable character string of the input transcription codes. Domain-specific corpora have been compiled to support the statistical computation. To improve accuracy, scalable techniques such as domain-specific transcription and special encoding are used. Put together, these techniques deliver 96% transcription accuracy.
机译:1997年后,香港实行法律双语制,因此必须用中英文制作大量而广泛的法院程序和判决。根据中文记录,粤语是中文的一种方言,是香港90%以上人口的母语,因此在法庭上被正式使用。对于法庭诉讼,必须记录粤语演说,并且已经开发了粤语计算机辅助转录系统。转录系统将速记编码转换为中文文本,即从该语言的语音转换为正字表示。主要挑战在于解决转换过程中同码问题导致的严重歧义。粤语华人的典型特征是有问题的同音异义,这构成了严重的挑战。 N-gram统计模型用于估计输入转录代码中最可能的字符串。特定领域的语料库已被编译以支持统计计算。为了提高准确性,使用了可伸缩的技术,例如域特定的转录和特殊的编码。综合起来,这些技术可提供96%的转录准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号