Last update: 9 Aug, 2022      CV     

Yige Xu (许一格)

About Me

     Currently, I am a second-year Ph.D. student at School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), where I am working with Prof. Chunyan Miao.

     Before that, I obtained my master degree in School of Computer Science, Fudan University (FDU) in 2021, where I worked with Prof. Xipeng Qiu and Prof. Xuanjing Huang. When I was in Fudan, I was a member of Fudan NLP Group and fastnlp develop team. I was one of the main contributors of fastNLP [GitHub]GitHub Repo stars [Gitee].

     From 2014 to 2018, I completed my bachelor's at Taishan College, Shandong University (SDU), where I worked with Prof. Jun Ma.

Education Bio

  1. 2021.8 - present: Ph.D. student, School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU). Working with Prof. Chunyan Miao.
  2. 2018.9 - 2021.6: M.Sc. Computer Science from Fudan University, member of Fudan NLP Group and fastnlp develop team, worked with Prof. Xipeng Qiu and Prof. Xuanjing Huang.
  3. 2014.9 - 2018.6: B.Eng. Computer Science and Technology from Taishan College, Shandong University, worked with Prof. Jun Ma. Taishan College is an honor college (aka. elite class) of Shandong University. Our major selects less than 20 students from more than 300 undergraduates each year.

Research Interests

Recently, I mainly focus on the following perspectives:


At Nanyang Technological University

  1. [New!] CZ3007 Compile Techniques (Semester 1, AY2022-2023). Teaching Assistant

At Fudan University

  1. MANA130376.01 Big Data driven Business Analytics and Application (Spring 2019). Teaching Assistant
  2. COMP130137.01 Pattern Recognition & Machine Learning (Spring 2020). Teaching Assistant
  3. DATA62004.01 Neural Network and Deep Learning (Spring 2020). Teaching Assistant


  1. How to Fine-Tune BERT for Text Classification?, CCL 2019 Best Paper Award
  2. Outstanding Students of Master's Degrees at Fudan University, 2020

Keynotes & Talks

  1. An Introduction to Prompting Methods, NTU Singapore, 04/05/2022.
  2. Multi-perspective Optimization of Pre-trained Language Model, at NTU Student Lecture Series (SLS), Singapore, 24/03/2022.
  3. An Introduction of Transformer, NTU Singapore, 25/08/2021.

Professional Services

Conference Reviewer / PC Members

  1.     2021: NAACL, ACL, EMNLP, ICCSE
  2.     2022: ACL Rolling Review, SIGIR, EMNLP

Journal Referee

  1.      Information Sciences


    (*: Equal contribution)

  1. [New!] MedChemLens: An Interactive Visual Tool to Support Direction Selection in Interdisciplinary Experimental Research of Medicinal Chemistry, IEEE VIS: Visualization & Visual Analytics (VIS), 2022. [BibTeX] [PDF]
    Chuhan Shi, Fei Nie, Yicheng Hu, Yige Xu, Lei Chen, Xiaojuan Ma, Qiong Luo. [Abstract]
  2. Abstract: Interdisciplinary experimental science (e.g., medicinal chemistry) refers to the disciplines that integrate knowledge from different scientific backgrounds and involve experiments in the research process. Deciding ''in what direction to proceed'' is critical for the success of the research in such disciplines, since the time, money, and resource costs of the subsequent research steps depend largely on this decision. However, such a direction identification task is challenging in that researchers need to integrate information from large-scale, heterogeneous materials from all associated disciplines and summarize the related publications of which the core contributions are often showcased in diverse formats. The task also requires researchers to estimate the feasibility and potential in future experiments in the selected directions. In this work, we selected medicinal chemistry as a case and presented an interactive visual tool, MedChemLens, to assist medicinal chemists in choosing their intended directions of research. This task is also known as drug target (i.e., disease-linked proteins) selection. Given a candidate target name, MedChemLens automatically extracts the molecular features of drug compounds from chemical papers and clinical trial records, organizes them based on the drug structures, and interactively visualizes factors concerning subsequent experiments. We evaluated MedChemLens through a within-subjects study (N=16). Compared with the control condition (i.e., unrestricted online search without using our tool), participants who only used MedChemLens reported faster search, better-informed selections, higher confidence in their selections, and lower cognitive load.
        title = "{MedChemLens}: An Interactive Visual Tool to Support Direction Selection in Interdisciplinary Experimental Research of Medicinal Chemistry",
        author = "Shi, Chuhan  and
    		Nie, Fei	and
    		Hu, Yicheng	and
    		Xu, Yige  and
    		Chen, Lei	and
    		Ma, Xiaojuan	and
    		Luo, Qiong",
        booktitle = "IEEE VIS: Visualization \& Visual Analytics",
        year = "2022",
  3. Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning, (Findings of EMNLP), 2021. [BibTeX] [PDF] [Code] GitHub Repo stars
    Yichao Luo*, Yige Xu*, Jiacheng Ye, Xipeng Qiu, Qi Zhang. [Abstract]
  4. Abstract: Aiming to generate a set of keyphrases, Keyphrase Generation (KG) is a classical task for capturing the central idea from a given document. Based on Seq2Seq models, the previous reinforcement learning framework on KG tasks utilizes the evaluation metrics to further improve the well-trained neural models. However, these KG evaluation metrics such as F1@5 and F1@M are only aware of the exact correctness of predictions on phrase-level and ignore the semantic similarities between similar predictions and targets, which inhibits the model from learning deep linguistic patterns. In response to this problem, we propose a new fine-grained evaluation metric to improve the RL framework, which considers different granularities: token-level F1 score, edit distance, duplication, and prediction quantities. On the whole, the new framework includes two reward functions: the fine-grained evaluation score and the vanilla F1 score. This framework helps the model identifying some partial match phrases which can be further optimized as the exact match ones. Experiments on KG benchmarks show that our proposed training frame- work outperforms the previous RL training frameworks among all evaluation scores. In addition, our method can effectively ease the synonym problem and generate a higher qual- ity prediction. The source code is available at this URL.
        title = "Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning",
        author = "Luo, Yichao  and
          Xu, Yige  and
          Ye, Jiacheng  and
          Qiu, Xipeng  and
          Zhang, Qi",
        booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
        month = nov,
        year = "2021",
        address = "Punta Cana, Dominican Republic",
        publisher = "Association for Computational Linguistics",
        url = "",
        pages = "497--507",
  5. Searching Effective Transformer for Seq2Seq Keyphrase Generation, CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC), 2021. [BibTeX] [DOI] [PDF]
    Yige Xu*, Yichao Luo*, Yicheng Zou, Zhengyan Li, Qi Zhang, Xipeng Qiu, Xuanjing Huang [Abstract]
  6. Abstract: Keyphrase Generation (KG) aims to generate a set of keyphrases to represent the topic information of a given document, which is a worthy task of Natural Language Processing (NLP). Recently, the Transformer structure with fully-connected self-attention blocks has been widely used in many NLP tasks due to its advantage of parallelism and global context modeling. However, in KG tasks, Transformer-based models can hardly beat the recurrent-based models. Our observations also confirm this phenomenon. Based on our observations, we state the {\it \uhypothesis} to explain why Transformer-based models perform poorly in KG tasks. In this paper, we conducted exhaustive experiments to confirm our hypothesis, and search for an effective Transformer model for keyphrase generation. Comprehensive experiments on multiple KG benchmarks showed that: (1) In KG tasks, uninformative content abounds in documents while salient information is diluted globally. (2) The vanilla Transformer equipped with a fully-connected self-attention mechanism may overlook the local context, leading to performance degradation. (3) We add constraints to the self-attention mechanism and introduce direction information to improve the vanilla Transformer model, which achieves state-of-the-art performance on KG benchmarks.
    	title={Searching Effective Transformer for Seq2Seq Keyphrase Generation},
    	author={Xu, Yige and Luo, Yichao and Zou, Yicheng and Li, Zhengyan and Zhang, Qi and Qiu, Xipeng and Huang, Xuanjing},
    	title={Searching Effective Transformer for Seq2Seq Keyphrase Generation},
    	booktitle={Natural Language Processing and Chinese Computing - 10th {CCF} International Conference, {NLPCC} 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part {II}},
    	series={Lecture Notes in Computer Science},
  7. Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation, JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (JCST, To be appeared), [BibTeX] [DOI] [PDF]
    Yige Xu, Xipeng Qiu, Ligao Zhou, Xuanjing Huang. [Abstract]
  8. Abstract: Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-train tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. Experiments on GLUE benchmark and Text Classification benchmark show that our proposed methods can significantly improve the adaption of BERT without any external data or knowledge. We conduct exhaustive experiments to investigate the efficiency of self-ensemble and self-distillation mechanisms, and our proposed methods achieve a new state-of-the-art result on the SNLI dataset.
    	title={Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation},
    	author={Xu, Yige and Qiu, Xipeng and Zhou, Ligao and Huang, Xuanjing},
    	year = {2021},
    	doi = {}
  9. ONE2SET: Generating Diverse Keyphrases as a Set, (ACL), 2021. [BibTeX] [PDF] [Code] GitHub Repo stars
    Jiacheng Ye, Tao Gui, Yichao Luo, Yige Xu, Qi Zhang. [Abstract]
  10. BibTeX:
        title = "{O}ne2{S}et: {G}enerating Diverse Keyphrases as a Set",
        author = "Ye, Jiacheng  and
          Gui, Tao  and
          Luo, Yichao  and
          Xu, Yige  and
          Zhang, Qi",
        booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
        month = aug,
        year = "2021",
        address = "Online",
        publisher = "Association for Computational Linguistics",
        url = "",
        doi = "10.18653/v1/2021.acl-long.354",
        pages = "4598--4608",
    Abstract: Recently, the sequence-to-sequence models have made remarkable progress on the task of keyphrase generation (KG) by concatenating multiple keyphrases in a predefined order as a target sequence during training. However, the keyphrases are inherently an unordered set rather than an ordered sequence. Imposing a predefined order will give wrong bias during training, which can highly penalize shifts in the order between keyphrases. In this work, we introduce a new training paradigm ONE2SET without predefining an order to concatenate the keyphrases. To fit this paradigm, we propose a novel model that consists of a fixed set of learned control codes to generate keyphrases in parallel. To solve the problem that there is no correspondence between each prediction and target during training, we introduce a K-step target assignment mechanism via bipartite matching, which greatly increases the diversity and reduces the duplication ratio of generated keyphrases. The experimental results on multiple benchmarks demonstrate that our approach significantly outperforms the state-of-the-art methods.
  11. Pre-trained Models for Natural Language Processing: A Survey, SCIENCE CHINA Technological Sciences (Invited Paper, Most Influential Paper of SCTS in 2020), 2020. [BibTeX] [DOI] [PDF]
    Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang. [Abstract]
  12. Abstract: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
    	author = {Xipeng Qiu and TianXiang Sun and Yige Xu and Yunfan Shao and Ning Dai and Xuanjing Huang},
    	title = {Pre-trained Models for Natural Language Processing: A Survey},
    	journal = {SCIENCE CHINA Technological Sciences},
    	publisher = {Science China Press},
    	year = {2020},
    	volume = {63},
    	number = {10},
    	pages = {1872--1897},
    	doi = {}
  13. How to Fine-Tune BERT for Text Classification? China National Conference on Chinese Computational Linguistics (CCL, Best Paper Award), 2019. [BibTeX] [DOI] [PDF] [Code] GitHub Repo stars
    Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang. [Abstract]
  14. Abstract: Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
      title={How to fine-tune {BERT} for text classification?},
      author={Sun, Chi and Qiu, Xipeng and Xu, Yige and Huang, Xuanjing},
      booktitle={China National Conference on Chinese Computational Linguistics},