Last update: 11 June, 2026      CV     
Profile photo

Yige Xu (许一格)

Ph.D.

Biography

     I am currently a postdoctoral researcher at the Alibaba-NTU Global e-Sustainability CorpLab (ANGEL) at Nanyang Technological University. I obtained my Ph.D. in 2026 at College of Computing and Data Science (CCDS), Nanyang Technological University (NTU), Singapore, under the supervision of Prof. Chunyan Miao. Before that, I obtained my master's degree in Fudan NLP Group, School of Computer Science, Fudan University (FDU) in 2021, where I worked with Prof. Xipeng Qiu and Prof. Xuanjing Huang. From 2014 to 2018, I completed my bachelor's degree at Taishan College, Shandong University (SDU), where I participated the China Top-Notch Undergraduate Training Program and worked with Prof. Jun Ma.

     I was one of the main contributors of fastNLP [GitHub]GitHub Repo stars [Gitee].

Research Interest

     My research interests are centred on: (i) Machine Learning and Natural Language Processing (NLP), with a particular emphasis on Large Language Models (LLMs) and Large Vision-Language Models, and (ii) LLM-based Interdisciplinary Research. Specifically, I focus on developing fine-grained efficient methods for large-scale multi-modal models, and on applying these advances to interdisciplinary researches.

     I am seeking highly self-motivated and promising students. If you are interested in my research topics, please email me with your research status and resume.

Teaching

At NTU

At FDU

Awards

  1. Outstanding Students of Master's Degrees at Fudan University, 2020
  2. How to Fine-Tune BERT for Text Classification?, CCL 2019 Best Paper Award

Keynotes & Talks

  1. LLM Advances: Evolution and Frontiers of Chain-of-Thought Reasoning in LLMs, NTU Singapore, 11 July 2025. [Slides]
  2. An Introduction to Prompting Methods, NTU Singapore, 04 May 2022.[Slides]
  3. Multi-perspective Optimization of Pre-trained Language Model, at NTU Student Lecture Series (SLS), Singapore, 24 March 2022. [Slides][Video]
  4. An Introduction of Transformer, NTU Singapore, 25 August 2021.[Slides]

Professional Services

Conference Reviewer / PC Members

Journal Reviewer

Selected Publications

    * denotes co-first authorship. (Full List)

  1. SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs, (ACL), 2025. [BibTeX] [PDF] [Slides] [Code] GitHub Repo stars LLM Reasoning
    Yige Xu*, Xu Guo*, Zhiwei Zeng, Chunyan Miao. [Abstract]
  2. Abstract: Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often require full-model fine-tuning and suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the LLM. Specifically, we employ a lightweight fixed assistant model to speculatively generate instance-specific soft thought tokens as the initial chain of thoughts, which are then mapped into the LLM’s representation space via a trainable projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning.
    BibTeX:
    @inproceedings{xu2025softcot,
      title = "{SoftCoT}: Soft Chain-of-Thought for Efficient Reasoning with {LLM}s",
      author = "Xu, Yige  and
        Guo, Xu  and
        Zeng, Zhiwei  and
        Miao, Chunyan",
      booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
      month = jul,
      year = "2025",
      address = "Vienna, Austria",
      publisher = "Association for Computational Linguistics",
      url = "https://aclanthology.org/2025.acl-long.1137/",
      pages = "23336--23351",
    }
    			
  3. RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference, (EMNLP), 2024. [BibTeX] [PDF] [Slides] [Code] LLM Efficiency
    Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao. [Abstract]
  4. Abstract: Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput demands. Data multiplexing addresses this by merging multiple inputs into a single composite input, allowing more efficient inference through a shared forward pass. However, as distinguishing individuals from a composite input is challenging, conventional methods typically require training the entire backbone, yet still suffer from performance degradation. In this paper, we introduce RevMUX, a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer, which can be reused by the demultiplexer to perform reverse operations and restore individual samples for classification. Extensive experiments on four datasets and three types of LLM backbones demonstrate the effectiveness of RevMUX for enhancing LLM inference efficiency while retaining a satisfactory classification performance.
    BibTeX:
    @inproceedings{xu-etal-2024-revmux,
        title = "{R}ev{MUX}: Data Multiplexing with Reversible Adapters for Efficient {LLM} Batch Inference",
        author = "Xu, Yige  and
          Guo, Xu  and
          Zeng, Zhiwei  and
          Miao, Chunyan",
        booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
        month = nov,
        year = "2024",
        address = "Miami, Florida, USA",
        publisher = "Association for Computational Linguistics",
        url = "https://aclanthology.org/2024.emnlp-main.1232",
        pages = "22072--22087",
    }
    			
  5. Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation, JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, (JCST), July 2023, Vol. 38(4), pp. 853-866, 2023. [BibTeX] [DOI] [PDF] LM Fine-Tuning
    Yige Xu, Xipeng Qiu, Ligao Zhou, Xuanjing Huang. [Abstract]
  6. Abstract: Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-train tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. Experiments on GLUE benchmark and Text Classification benchmark show that our proposed methods can significantly improve the adaption of BERT without any external data or knowledge. We conduct exhaustive experiments to investigate the efficiency of self-ensemble and self-distillation mechanisms, and our proposed methods achieve a new state-of-the-art result on the SNLI dataset.
    BibTeX:
    @article{xu2023jcst-self-distillation,
    	title={Improving {BERT} Fine-Tuning via Self-Ensemble and Self-Distillation},
    	author={Xu, Yige and Qiu, Xipeng and Zhou, Ligao and Huang, Xuanjing},
    	journal={J. Comput. Sci. Technol.},
    	volume={38},
    	number={4},
    	pages={853--866},
    	year = {2023},
    	doi = {https://doi.org/10.1007/s11390-021-1119-0}
    }
    			
  7. Pre-trained Models for Natural Language Processing: A Survey, SCIENCE CHINA Technological Sciences, (Most Influential Paper of SCTS in 2021), 2020. [BibTeX] [DOI] [PDF] Survey
    Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang. [Abstract]
  8. Abstract: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
    BibTeX:
    @article{qiu2020:scts-ptms,
    	author = {Xipeng Qiu and TianXiang Sun and Yige Xu and Yunfan Shao and Ning Dai and Xuanjing Huang},
    	title = {Pre-trained Models for Natural Language Processing: A Survey},
    	journal = {SCIENCE CHINA Technological Sciences},
    	publisher = {Science China Press},
    	year = {2020},
    	volume = {63},
    	number = {10},
    	pages = {1872--1897},
    	doi = {https://doi.org/10.1007/s11431-020-1647-3}
    }
    			
  9. How to Fine-Tune BERT for Text Classification? China National Conference on Chinese Computational Linguistics, (CCL, Best Paper Award), 2019. [BibTeX] [DOI] [PDF] [Code] GitHub Repo stars LM Fine-Tuning
    Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang. [Abstract]
  10. Abstract: Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
    BibTeX:
    @inproceedings{sun2019fine,
      title={How to fine-tune {BERT} for text classification?},
      author={Sun, Chi and Qiu, Xipeng and Xu, Yige and Huang, Xuanjing},
      booktitle={China National Conference on Chinese Computational Linguistics},
      pages={194--206},
      year={2019},
      organization={Springer}
    }