首页> 外国专利> Multi-Task Knowledge Distillation for Language Model

Multi-Task Knowledge Distillation for Language Model

机译：语言模型的多任务知识蒸馏

页面导航

摘要
著录项
相似文献

摘要

Systems and methods are provided that employ knowledge distillation under a multi-task learning setting. In some embodiments, the systems and methods are implemented with a larger teacher model and a smaller student model, each of which comprise one or more shared layers and a plurality of task layers for performing multiple tasks. During training of the teacher model, its shared layers are initialized, and then the teacher model is multi-task refined. The teacher model predicts teacher logits. During training of the student model, its shared layers are initialized. Knowledge distillation is employed to transfer knowledge from the teacher model to the student model by the student model updating its shared layers and task layers, for example, according to the teacher logits of the teacher model. Other features are also provided.

机译：提供了在多任务学习设置下使用知识蒸馏的系统和方法。在一些实施例中，系统和方法用更大的教师模型和较小的学生模型实现，每个学生模型包括一个或多个共享层和用于执行多个任务的多个任务层。在培训教师模型期间，其共享层初始化，然后教师模型是多任务精制的。教师模型预测教师的登录。在培训学生模型期间，初始化其共享层。例如，通过学生模型将学生模型从教师模型转移到学生模型的知识蒸馏，例如，根据教师模型的教师登录，学生模型更新其共享层和任务层。还提供其他特点。

著录项

公开/公告号US2021142164A1

专利类型
公开/公告日2021-05-13

原文格式PDF
申请/专利权人 SALESFORCE.COM INC.;
展开▼

申请/专利号US201916716249
发明设计人 LINQING LIU;CAIMING XIONG;
展开▼

申请日2019-12-16
分类号G06N3/08;G06N3/04;G06F40/30;G06F40/40;
国家 US
入库时间 2022-08-24 18:40:20

相似文献

专利
外文文献
中文文献