PELMo: Prompt-based Ensemble Expert Language Models with Multi-label Routing

Authors: Jiansheng Wang,Jian Zhang,Yuzhi Mu,Wei Han,Xuefan Xu,Junyu Shen,Tianming Ma
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 745-759
Keywords: large language models,expert models,prompt,ensemble

Abstract

The large language models LLMs utilize few-shot and zero-shot prompting to support tasks across multiple domains better. Despite LLM's strong performance on a wide range of natural language tasks, a single LLM is often difficult to generalize to multiple domains that require dif-ferent knowledge and abilities. To overcome this problem, we introduce PELMo, an ensemble framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple language expert models. Our method combines multiple expert models by training an additional routing model. First, by optimizing prompts with instruction for different tasks, we obtain expert models with different task capabilities based on the same backbone. Af-terwards, a multi-label routing model is trained to select k top-ranked expert models for each question strategically. Finally, the outputs of the selected expert models at the final layer are through weighted averaging to generate the ultimate answer. Our results demonstrate that PELMo outperforms the expert models within the target domain and achieves robust capabilities in the whole scope of tasks. Overall, these results demonstrate the benefits of ensembling k top-ranked expert models during language modeling.
📄 View Full Paper (PDF) 📋 Show Citation