I’m a last-year Ph.D. student in the School of Informatics at Nagoya University. I am supervised by Prof. Tomoki Toda from the Information Technology Center at Nagoya University.

Prior to this, I was supervised by Prof. Jianwu Dang from the School of Information Science at the Japan Advanced Institute of Science and Technology. From October 2020 to April 2022, I was an algorithm researcher at the AI Research Institute, Hithink RoyalFlush, working on speech emotion recognition and synthesis under the supervision of Prof. Xinhui Hu.

My research interests include emotion recognition, emotion synthesis, emotion conversion, and dialogue systems.

📖 Educations

2022.10 - 2026.10 (Expected), Nagoya University, Aichi, Japan. 🇯🇵
2019.04 - 2020.10, Japan Advanced Institute of Science and Technology, Ishikawa, Japan. 🇯🇵
2013.09 - 2017.06, Shaanxi University of Technology, Shaanxi, China. 🇨🇳

💻 Careers

2020.10 - 2022.04, AI Research Institute, Hithink RoyalFlush, Hangzhou, China.

🔥 News

2026.04: 🎉🎉 First author paper accepted to IEEE Transactions ASLP.
2026.03: 🎉🎉 Co-authored paper accepted to Computer Speech & Language.
2026.03: 🎉🎉 Co-authored paper accepted to IEEE Transactions ASLP.
2026.01: 🎉🎉 First author paper accepted to IEEE Transactions ASLP.
2025.12: 🎉🎉 Co-authored paper accepted to Ecological Informatics.
2025.09: 🎉🎉 Co-authored paper accepted to IEEE Transactions ASLP.
2025.05: 🎉🎉 Three first-author papers accepted at INTERSPEECH 2025 (Rotterdam, Netherlands). 🇳🇱
2024.09: 🎉🎉 First author poster accepted at 2024 APSIPA China-Japan Joint Symposium (Tianjin, China). 🇨🇳
2024.09: 🎉🎉 First author paper and two co-authored papers accepted at APSIPA ASC 2024 (Macau, China). 🇲🇴
2024.06: MNS次世代研究事業 2024年度高貢献 RESEARDENT.
2024.06: 🎉🎉 First author paper accepted at INTERSPEECH 2024 (Kos, Greek). 🇬🇷
2023.12: 🎉🎉 Co-First author paper accepted at ICASSP 2024 (Seoul, South Korea). 🇰🇷
2023.07: 🎉🎉 Co-authored paper accepted at MRAC 2023 (Ottawa, Canada). 🇨🇦
2023.06: 🎉🎉 Co-authored paper accepted to IEEE Transactions ASLP.
2023.05: 🎉🎉 First author paper accepted at INTERSPEECH 2023 (Dublin, Ireland). 🇮🇪
2022.09: Started my Ph.D. degree at Nagoya University (Aichi, Japan). 🇯🇵
2020.10: Joined the AI Research Institute at Hithink RoyalFlush (Hangzhou, China). 🇨🇳
2020.05: 🎉🎉 First author paper accepted at INTERSPEECH 2020 (Shanghai, China). 🇨🇳
2018.04: Started my master’s degree at Japan Advanced Institute of Science and Technology (Ishikawa, Japan). 🇯🇵

📝 Publications

Journal:

IEEE Transactions ASLP Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda. “A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition.” IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026.
IEEE Transactions ASLP Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Emotion Similarity and Shift: Modeling Temporal Dynamic Interactions for Emotion Prediction in Conversation.” IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Apr. 2026.
IEEE Transactions ASLP Xingfeng Li, Xiaohan Shi, Desheng Hu, Yongwei Li, Qingchen Zhang, Zhengxia Wang, Masashi Unoki, Masato Akagi. “Music Theory-inspired Acoustic Representation for Speech Emotion Recognition.” IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 2534-2547, Jun. 2023.
IEEE Transactions ASLP Jiajun He, Xiaohan Shi, Cheng-Hung Hu, Jinyi Mi, Xingfeng Li, Tomoki Toda. “M4SER: Multimodal Multi-Representation Multi-Task Multi-Strategy Learning for Speech Emotion Recognition.” IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025.
Computer Speech & Language Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. “Robust Speech Emotion Recognition under Human Speech Noise.” Computer Speech & Language, Mar. 2026.
IEEE Transactions ASLP Xingfeng Li, Ningfeng Luo, Feifei Yu, Xiaohan Shi, Junjie Li, Yang Liu. “Multi-Task Deep Learning with Over-Sampling and Style Randomization for Improved Cross-Regional Bird Vocalization Recognition.” IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026.
Ecological Informatics Xingfeng Li, Ningfeng Luo, Feifei Yu, Junjie Li, Kai Li, Yongwei Li, Zhen Zhao, Yang Liu, Xiaohan Shi. “HARL: Human Auditory Representation Learning for Cross-Dialect Bird Species Recognition.” Ecological Informatics, Dec. 2025.

Conference:

INTERSPEECH Xiaohan Shi, Sixia Li, Jianwu Dang. “Dimensional Emotion Prediction Based on Interactive Context in Conversation.” In Proc. INTERSPEECH, pp.4193-4197, 2020.
INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation.” In Proc. INTERSPEECH, pp.765-769, 2023.
INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition.” In Proc. INTERSPEECH, pp.4193-4197, 2024.
INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Who, When, and What: Leveraging the “Three Ws” Concept for Emotion Recognition in Conversation.” In Proc. INTERSPEECH, pp.1763-1767, 2025.
INTERSPEECH Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Speaker-Aware Multi-Task Learning for Speech Emotion Recognition.” In Proc. INTERSPEECH, pp.4333-4337, 2025.
INTERSPEECH Xiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda. “Advancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations.” In Proc. INTERSPEECH, pp.4693-4697, 2025.
APSIPA Xiaohan Shi, Yuan Gao, Jiajun He, Jinyi Mi, Xingfeng Li, Tomoki Toda. “A Study on Multimodal Fusion and Layer Adaptor in Emotion Recognition.” Proc. APSIPA ASC, 2024.
ICASSP Jiajun He*, Xiaohan Shi*, Xingfeng Li, Tomoki Toda. “Mf-Aed-Aec: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction.” In Proc. IEEE-ICASSP, pp.11066-11070, 2024
APSIPA Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. “Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions.” Proc. APSIPA ASC, 2024.
APSIPA Xingfeng Li, Xiaohan Shi, Yuke Si, Qian Chen, Yang Liu, Masashi Unoki, Masato Akagi. “BEES: A New Acoustic Task for Blended Emotion Estimation in Speech.” Proc. APSIPA ASC, 2024.
MRAC Jingguang Tian*, Desheng Hu*, Xiaohan Shi, Jiajun He, Xingfeng Li, Yuan Gao, Tomoki Toda, Xinkang Xu, Xinhui Hu. “Semi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction.” In Proc. MRAC, pp.67–73, 2023.

Under Review:

Xingfeng Li, Xiaohan Shi, Junjie Li, Yongwei Li, Masashi Unoki, Tomoki Toda, Masato Akagi. “EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning.” Submitted to IEEE Transactions on Affective computing. (Major Revision)
Xiaohan Shi, Xingfeng Li, Jinyi Mi, Tomoki Toda. “Advancing the “Three Ws” Concept for Speaker-Aware Emotion Recognition in Conversation.” Submitted to Computer Speech & Language.
Yupei Guo, Jiajun He, Xiaohan Shi, Tomoki Toda, Zekun Yang, Bowen Wang, Yukinobu Taniguchi. “Leveraging LLM-Generated Explanations for Detecting Emotionally Rewritten Fake News.” Submitted to SMC 2026.
Xiaohan Shi, Xingfeng Li, Tomoki Toda. “Exploiting Modality-Specific Label Variations for Enhanced Multimodal Emotion Recognition.” Submitted to INTERSPEECH 2026.
Xiaohan Shi, Tomoki Toda. “W³EDM: 3W-based Emotional Modeling with Emotion Descriptions for Conversational Speech Synthesis.” Submitted to ACM MM 2026

🎤 Invited Talk

Tomoki Toda, Xiaohan Shi. “Recent advances in speech information processing focusing on speech expression.” Acoustical Society of Japan, Mar, 2026.

🎏 Competitions

MER 2023: Multi-label Learning, Semi-Supervised Learning, and Modality Robustness.
Track 1: 7/23; Track 2: 8/23; Track 3: 4/15.
Odyssey 2024: Speech Emotion Recognition Challenge.
Track 1: 11/17; Track 2: 5/15.
Interspeech 2025: Speech Emotion Recognition in Naturalistic Conditions Challenge.
Track 1: 6/43.

🎖 Honors and Awards

Fellowship:

2019.10 - 2020.10, 石川県私費外国人留学生奨学金.
2020.04 - 2020.10, 北陸先端科学技術大学院大学研究助理 RA
2022.10 - 2025.10, 東海国立大学機構融合フロンティア次世代研究事業(融合フロンティア次世代リサーチャー).
2024.06 - 2025.03, MNS次世代研究事業 2024年度高貢献 RESEARDENT.
2025.10 - 2026.10, 名古屋大学事務補助員 TA

Reviewer:

Journal：

2023.06 - (Now), Speech Communication.
2025.05 - (Now), IEEE Transactions on Audio, Speech and Language Processing.
2025.09 - (Now), APSIPA Transactions on Signal and Information Processing.
2025.10 - (Now), International Journal of Human-Computer Interaction.
2025.10 - (Now), Neurocomputing.
2025.12 - (Now), IEEE Transactions on Affective computing.

Conference：

2023.07 - (Now), Asia-Pacific Signal and Information Processing Association (APSIPA ASC).
2023.07 - (Now), IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
2026.01 - (Now), INTERSPEECH.
2026.01 - (Now), WCCI.
2026.04 - (Now), ACM MM.

Xiaohan SHI