首页>
外国专利>
Multi-Task Knowledge Distillation for Language Model
Multi-Task Knowledge Distillation for Language Model
展开▼
机译:语言模型的多任务知识蒸馏
展开▼
页面导航
摘要
著录项
相似文献
摘要
Systems and methods are provided that employ knowledge distillation under a multi-task learning setting. In some embodiments, the systems and methods are implemented with a larger teacher model and a smaller student model, each of which comprise one or more shared layers and a plurality of task layers for performing multiple tasks. During training of the teacher model, its shared layers are initialized, and then the teacher model is multi-task refined. The teacher model predicts teacher logits. During training of the student model, its shared layers are initialized. Knowledge distillation is employed to transfer knowledge from the teacher model to the student model by the student model updating its shared layers and task layers, for example, according to the teacher logits of the teacher model. Other features are also provided.
展开▼