Iโ€™m a last-year Ph.D. student in the School of Informatics at Nagoya University. I am supervised by Prof. Tomoki Toda from the Information Technology Center at Nagoya University.

Prior to this, I was supervised by Prof. Jianwu Dang from the School of Information Science at the Japan Advanced Institute of Science and Technology. From October 2020 to April 2022, I was an algorithm researcher at the AI Research Institute, Hithink RoyalFlush, working on speech emotion recognition and synthesis under the supervision of Prof. Xinhui Hu.

My research interests include emotion recognition, emotion synthesis, emotion conversion, and dialogue systems.

๐Ÿ“– Educations

๐Ÿ’ป Careers

  • 2020.10 - 2022.04, AI Research Institute, Hithink RoyalFlush, Hangzhou, China.

๐Ÿ”ฅ News

  • 2026.04: ย ๐ŸŽ‰๐ŸŽ‰ First author paper accepted to IEEE Transactions ASLP.
  • 2026.03: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted to Computer Speech & Language.
  • 2026.03: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted to IEEE Transactions ASLP.
  • 2026.01: ย ๐ŸŽ‰๐ŸŽ‰ First author paper accepted to IEEE Transactions ASLP.
  • 2025.12: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted to Ecological Informatics.
  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted to IEEE Transactions ASLP.
  • 2025.05: ย ๐ŸŽ‰๐ŸŽ‰ Three first-author papers accepted at INTERSPEECH 2025 (Rotterdam, Netherlands). ๐Ÿ‡ณ๐Ÿ‡ฑ
  • 2024.09: ย ๐ŸŽ‰๐ŸŽ‰ First author poster accepted at 2024 APSIPA China-Japan Joint Symposium (Tianjin, China). ๐Ÿ‡จ๐Ÿ‡ณ
  • 2024.09: ย ๐ŸŽ‰๐ŸŽ‰ First author paper and two co-authored papers accepted at APSIPA ASC 2024 (Macau, China). ๐Ÿ‡ฒ๐Ÿ‡ด
  • 2024.06: MNSๆฌกไธ–ไปฃ็ ”็ฉถไบ‹ๆฅญ 2024ๅนดๅบฆ ้ซ˜่ฒข็Œฎ RESEARDENT.
  • 2024.06: ย ๐ŸŽ‰๐ŸŽ‰ First author paper accepted at INTERSPEECH 2024 (Kos, Greek). ๐Ÿ‡ฌ๐Ÿ‡ท
  • 2023.12: ย ๐ŸŽ‰๐ŸŽ‰ Co-First author paper accepted at ICASSP 2024 (Seoul, South Korea). ๐Ÿ‡ฐ๐Ÿ‡ท
  • 2023.07: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted at MRAC 2023 (Ottawa, Canada). ๐Ÿ‡จ๐Ÿ‡ฆ
  • 2023.06: ย ๐ŸŽ‰๐ŸŽ‰ Co-authored paper accepted to IEEE Transactions ASLP.
  • 2023.05: ย ๐ŸŽ‰๐ŸŽ‰ First author paper accepted at INTERSPEECH 2023 (Dublin, Ireland). ๐Ÿ‡ฎ๐Ÿ‡ช
  • 2022.09: Started my Ph.D. degree at Nagoya University (Aichi, Japan). ๐Ÿ‡ฏ๐Ÿ‡ต
  • 2020.10: Joined the AI Research Institute at Hithink RoyalFlush (Hangzhou, China). ๐Ÿ‡จ๐Ÿ‡ณ
  • 2020.05: ย ๐ŸŽ‰๐ŸŽ‰ First author paper accepted at INTERSPEECH 2020 (Shanghai, China). ๐Ÿ‡จ๐Ÿ‡ณ
  • 2018.04: Started my masterโ€™s degree at Japan Advanced Institute of Science and Technology (Ishikawa, Japan). ๐Ÿ‡ฏ๐Ÿ‡ต

๐Ÿ“ Publications

Journal:

  • IEEE Transactions ASLP Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda. โ€œA Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition.โ€ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026.

  • IEEE Transactions ASLP Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œEmotion Similarity and Shift: Modeling Temporal Dynamic Interactions for Emotion Prediction in Conversation.โ€ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Apr. 2026.

  • IEEE Transactions ASLP Xingfeng Li, Xiaohan Shi, Desheng Hu, Yongwei Li, Qingchen Zhang, Zhengxia Wang, Masashi Unoki, Masato Akagi. โ€œMusic Theory-inspired Acoustic Representation for Speech Emotion Recognition.โ€ IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 2534-2547, Jun. 2023.

  • IEEE Transactions ASLP Jiajun He, Xiaohan Shi, Cheng-Hung Hu, Jinyi Mi, Xingfeng Li, Tomoki Toda. โ€œM4SER: Multimodal Multi-Representation Multi-Task Multi-Strategy Learning for Speech Emotion Recognition.โ€ IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025.

  • Computer Speech & Language Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. โ€œRobust Speech Emotion Recognition under Human Speech Noise.โ€ Computer Speech & Language, Mar. 2026.

  • IEEE Transactions ASLP Xingfeng Li, Ningfeng Luo, Feifei Yu, Xiaohan Shi, Junjie Li, Yang Liu. โ€œMulti-Task Deep Learning with Over-Sampling and Style Randomization for Improved Cross-Regional Bird Vocalization Recognition.โ€ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026.

  • Ecological Informatics Xingfeng Li, Ningfeng Luo, Feifei Yu, Junjie Li, Kai Li, Yongwei Li, Zhen Zhao, Yang Liu, Xiaohan Shi. โ€œHARL: Human Auditory Representation Learning for Cross-Dialect Bird Species Recognition.โ€ Ecological Informatics, Dec. 2025.

Conference:

  • INTERSPEECH Xiaohan Shi, Sixia Li, Jianwu Dang. โ€œDimensional Emotion Prediction Based on Interactive Context in Conversation.โ€ In Proc. INTERSPEECH, pp.4193-4197, 2020.

  • INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œEmotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation.โ€ In Proc. INTERSPEECH, pp.765-769, 2023.

  • INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œMultimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition.โ€ In Proc. INTERSPEECH, pp.4193-4197, 2024.

  • INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œWho, When, and What: Leveraging the โ€œThree Wsโ€ Concept for Emotion Recognition in Conversation.โ€ In Proc. INTERSPEECH, pp.1763-1767, 2025.

  • INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œSpeaker-Aware Multi-Task Learning for Speech Emotion Recognition.โ€ In Proc. INTERSPEECH, pp.4333-4337, 2025.

  • INTERSPEECH Xiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda. โ€œAdvancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations.โ€ In Proc. INTERSPEECH, pp.4693-4697, 2025.

  • APSIPA Xiaohan Shi, Yuan Gao, Jiajun He, Jinyi Mi, Xingfeng Li, Tomoki Toda. โ€œA Study on Multimodal Fusion and Layer Adaptor in Emotion Recognition.โ€ Proc. APSIPA ASC, 2024.

  • ICASSP Jiajun He*, Xiaohan Shi*, Xingfeng Li, Tomoki Toda. โ€œMf-Aed-Aec: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction.โ€ In Proc. IEEE-ICASSP, pp.11066-11070, 2024

  • APSIPA Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. โ€œTwo-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions.โ€ Proc. APSIPA ASC, 2024.

  • APSIPA Xingfeng Li, Xiaohan Shi, Yuke Si, Qian Chen, Yang Liu, Masashi Unoki, Masato Akagi. โ€œBEES: A New Acoustic Task for Blended Emotion Estimation in Speech.โ€ Proc. APSIPA ASC, 2024.

  • MRAC Jingguang Tian*, Desheng Hu*, Xiaohan Shi, Jiajun He, Xingfeng Li, Yuan Gao, Tomoki Toda, Xinkang Xu, Xinhui Hu. โ€œSemi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction.โ€ In Proc. MRAC, pp.67โ€“73, 2023.

Under Review:

  • Xingfeng Li, Xiaohan Shi, Junjie Li, Yongwei Li, Masashi Unoki, Tomoki Toda, Masato Akagi. โ€œEM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning.โ€ Submitted to IEEE Transactions on Affective computing. (Major Revision)

  • Xiaohan Shi, Xingfeng Li, Jinyi Mi, Tomoki Toda. โ€œAdvancing the โ€œThree Wsโ€ Concept for Speaker-Aware Emotion Recognition in Conversation.โ€ Submitted to APSIPA Transactions on Signal and Information Processing.

  • Yupei Guo, Jiajun He, Xiaohan Shi, Tomoki Toda, Zekun Yang, Bowen Wang, Yukinobu Taniguchi. โ€œLeveraging LLM-Generated Explanations for Detecting Emotionally Rewritten Fake News.โ€ Submitted to SMC 2026.

  • Xiaohan Shi, Xingfeng Li, Tomoki Toda. โ€œExploiting Modality-Specific Label Variations for Enhanced Multimodal Emotion Recognition.โ€ Submitted to INTERSPEECH 2026.

  • Xiaohan Shi, Tomoki Toda. โ€œWยณEDM: 3W-based Emotional Modeling with Emotion Descriptions for Conversational Speech Synthesis.โ€ Submitted to ACM MM 2026

Waiting for Submission:

  • Xingfeng Li, Xiaohan Shi, Masashi Unoki, Tomoki Toda, Masato Akagi. โ€œThe Contribution of Perceptual Semantic Primitives to Self-Supervised Learning Representations for Bilingual Speech Emotion Recognition.โ€

  • Yongwei Li, Xiaohan Shi, Xingfeng Li, Yang Liu, Zhen Zhao, Jianhua Tao, Aijun Li, Donna Erickson, Tomoki Toda, Masto Akagi, Feng Du. โ€œComparative Analysis of Handcrafted Acoustic Features and Pre-Trained Embeddings for Robust Speech Emotion Recognition Under Varying Noise Conditions.โ€

๐ŸŽค Invited Talk

  • Tomoki Toda, Xiaohan Shi. โ€œRecent advances in speech information processing focusing on speech expression.โ€ Acoustical Society of Japan, Mar, 2026.

๐ŸŽ Competitions

  • MER 2023: Multi-label Learning, Semi-Supervised Learning, and Modality Robustness.
    Track 1: 7/23; Track 2: 8/23; Track 3: 4/15.

  • Odyssey 2024: Speech Emotion Recognition Challenge.
    Track 1: 11/17; Track 2: 5/15.

  • Interspeech 2025: Speech Emotion Recognition in Naturalistic Conditions Challenge.
    Track 1: 6/43.

๐ŸŽ– Honors and Awards

Fellowship:

  • 2019.10 - 2020.10, ็Ÿณๅท็œŒ็ง่ฒปๅค–ๅ›ฝไบบ็•™ๅญฆ็”Ÿๅฅจๅญฆ้‡‘.
  • 2020.04 - 2020.10, ๅŒ—้™ธๅ…ˆ็ซฏ็ง‘ๅญฆๆŠ€่ก“ๅคงๅญฆ้™ขๅคงๅญฆ ็ ”็ฉถๅŠฉ็† RA
  • 2022.10 - 2025.10, ๆฑๆตทๅ›ฝ็ซ‹ๅคงๅญฆๆฉŸๆง‹่žๅˆใƒ•ใƒญใƒณใƒ†ใ‚ฃใ‚ขๆฌกไธ–ไปฃ็ ”็ฉถไบ‹ๆฅญ(่žๅˆใƒ•ใƒญใƒณใƒ†ใ‚ฃใ‚ขๆฌกไธ–ไปฃใƒชใ‚ตใƒผใƒใƒฃใƒผ).
  • 2024.06 - 2025.03, MNSๆฌกไธ–ไปฃ็ ”็ฉถไบ‹ๆฅญ 2024ๅนดๅบฆ ้ซ˜่ฒข็Œฎ RESEARDENT.
  • 2025.10 - 2026.10, ๅๅคๅฑ‹ๅคงๅญฆ ไบ‹ๅ‹™่ฃœๅŠฉๅ“ก TA

Reviewer:

Journal๏ผš

  • 2023.06 - (Now), Speech Communication.
  • 2025.05 - (Now), IEEE Transactions on Audio, Speech and Language Processing.
  • 2025.09 - (Now), APSIPA Transactions on Signal and Information Processing.
  • 2025.10 - (Now), International Journal of Human-Computer Interaction.
  • 2025.10 - (Now), Neurocomputing.
  • 2025.12 - (Now), IEEE Transactions on Affective computing.

Conference๏ผš

  • 2023.07 - (Now), Asia-Pacific Signal and Information Processing Association (APSIPA ASC).
  • 2023.07 - (Now), IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
  • 2026.01 - (Now), INTERSPEECH.
  • 2026.01 - (Now), WCCI.
  • ๐Ÿ’ฌ Visitor