Iโm a last-year Ph.D. student in the School of Informatics at Nagoya University. I am supervised by Prof. Tomoki Toda from the Information Technology Center at Nagoya University.
Prior to this, I was supervised by Prof. Jianwu Dang from the School of Information Science at the Japan Advanced Institute of Science and Technology. From October 2020 to April 2022, I was an algorithm researcher at the AI Research Institute, Hithink RoyalFlush, working on speech emotion recognition and synthesis under the supervision of Prof. Xinhui Hu.
My research interests include emotion recognition, emotion synthesis, emotion conversion, and dialogue systems.
๐ Educations
- 2022.10 - 2026.10 (Expected), Nagoya University, Aichi, Japan. ๐ฏ๐ต
- 2019.04 - 2020.10, Japan Advanced Institute of Science and Technology, Ishikawa, Japan. ๐ฏ๐ต
- 2013.09 - 2017.06, Shaanxi University of Technology, Shaanxi, China. ๐จ๐ณ
๐ป Careers
- 2020.10 - 2022.04, AI Research Institute, Hithink RoyalFlush, Hangzhou, China.
๐ฅ News
- 2026.04: ย ๐๐ First author paper accepted to IEEE Transactions ASLP.
- 2026.03: ย ๐๐ Co-authored paper accepted to Computer Speech & Language.
- 2026.03: ย ๐๐ Co-authored paper accepted to IEEE Transactions ASLP.
- 2026.01: ย ๐๐ First author paper accepted to IEEE Transactions ASLP.
- 2025.12: ย ๐๐ Co-authored paper accepted to Ecological Informatics.
- 2025.09: ย ๐๐ Co-authored paper accepted to IEEE Transactions ASLP.
- 2025.05: ย ๐๐ Three first-author papers accepted at INTERSPEECH 2025 (Rotterdam, Netherlands). ๐ณ๐ฑ
- 2024.09: ย ๐๐ First author poster accepted at 2024 APSIPA China-Japan Joint Symposium (Tianjin, China). ๐จ๐ณ
- 2024.09: ย ๐๐ First author paper and two co-authored papers accepted at APSIPA ASC 2024 (Macau, China). ๐ฒ๐ด
- 2024.06: MNSๆฌกไธไปฃ็ ็ฉถไบๆฅญ 2024ๅนดๅบฆ ้ซ่ฒข็ฎ RESEARDENT.
- 2024.06: ย ๐๐ First author paper accepted at INTERSPEECH 2024 (Kos, Greek). ๐ฌ๐ท
- 2023.12: ย ๐๐ Co-First author paper accepted at ICASSP 2024 (Seoul, South Korea). ๐ฐ๐ท
- 2023.07: ย ๐๐ Co-authored paper accepted at MRAC 2023 (Ottawa, Canada). ๐จ๐ฆ
- 2023.06: ย ๐๐ Co-authored paper accepted to IEEE Transactions ASLP.
- 2023.05: ย ๐๐ First author paper accepted at INTERSPEECH 2023 (Dublin, Ireland). ๐ฎ๐ช
- 2022.09: Started my Ph.D. degree at Nagoya University (Aichi, Japan). ๐ฏ๐ต
- 2020.10: Joined the AI Research Institute at Hithink RoyalFlush (Hangzhou, China). ๐จ๐ณ
- 2020.05: ย ๐๐ First author paper accepted at INTERSPEECH 2020 (Shanghai, China). ๐จ๐ณ
- 2018.04: Started my masterโs degree at Japan Advanced Institute of Science and Technology (Ishikawa, Japan). ๐ฏ๐ต
๐ Publications
Journal:
-
IEEE Transactions ASLPXiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda. โA Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition.โ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026. -
IEEE Transactions ASLPXiaohan Shi, Xingfeng Li, Tomoki Toda. โEmotion Similarity and Shift: Modeling Temporal Dynamic Interactions for Emotion Prediction in Conversation.โ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Apr. 2026. -
IEEE Transactions ASLPXingfeng Li, Xiaohan Shi, Desheng Hu, Yongwei Li, Qingchen Zhang, Zhengxia Wang, Masashi Unoki, Masato Akagi. โMusic Theory-inspired Acoustic Representation for Speech Emotion Recognition.โ IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 2534-2547, Jun. 2023. -
IEEE Transactions ASLPJiajun He, Xiaohan Shi, Cheng-Hung Hu, Jinyi Mi, Xingfeng Li, Tomoki Toda. โM4SER: Multimodal Multi-Representation Multi-Task Multi-Strategy Learning for Speech Emotion Recognition.โ IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025. -
Computer Speech & LanguageJinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. โRobust Speech Emotion Recognition under Human Speech Noise.โ Computer Speech & Language, Mar. 2026. -
IEEE Transactions ASLPXingfeng Li, Ningfeng Luo, Feifei Yu, Xiaohan Shi, Junjie Li, Yang Liu. โMulti-Task Deep Learning with Over-Sampling and Style Randomization for Improved Cross-Regional Bird Vocalization Recognition.โ IEEE Transactions on Audio, Speech and Language Processing, Vol. 34, Jan. 2026. -
Ecological InformaticsXingfeng Li, Ningfeng Luo, Feifei Yu, Junjie Li, Kai Li, Yongwei Li, Zhen Zhao, Yang Liu, Xiaohan Shi. โHARL: Human Auditory Representation Learning for Cross-Dialect Bird Species Recognition.โ Ecological Informatics, Dec. 2025.
Conference:
-
INTERSPEECHXiaohan Shi, Sixia Li, Jianwu Dang. โDimensional Emotion Prediction Based on Interactive Context in Conversation.โ In Proc. INTERSPEECH, pp.4193-4197, 2020. -
INTERSPEECHXiaohan Shi, Xingfeng Li, Tomoki Toda. โEmotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation.โ In Proc. INTERSPEECH, pp.765-769, 2023. -
INTERSPEECHXiaohan Shi, Xingfeng Li, Tomoki Toda. โMultimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition.โ In Proc. INTERSPEECH, pp.4193-4197, 2024. -
INTERSPEECHXiaohan Shi, Xingfeng Li, Tomoki Toda. โWho, When, and What: Leveraging the โThree Wsโ Concept for Emotion Recognition in Conversation.โ In Proc. INTERSPEECH, pp.1763-1767, 2025. -
INTERSPEECHXiaohan Shi, Xingfeng Li, Tomoki Toda. โSpeaker-Aware Multi-Task Learning for Speech Emotion Recognition.โ In Proc. INTERSPEECH, pp.4333-4337, 2025. -
INTERSPEECHXiaohan Shi, Jinyi Mi, Xingfeng Li, Tomoki Toda. โAdvancing Emotion Recognition via Ensemble Learning: Integrating Speech, Context, and Text Representations.โ In Proc. INTERSPEECH, pp.4693-4697, 2025. -
APSIPAXiaohan Shi, Yuan Gao, Jiajun He, Jinyi Mi, Xingfeng Li, Tomoki Toda. โA Study on Multimodal Fusion and Layer Adaptor in Emotion Recognition.โ Proc. APSIPA ASC, 2024. -
ICASSPJiajun He*, Xiaohan Shi*, Xingfeng Li, Tomoki Toda. โMf-Aed-Aec: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction.โ In Proc. IEEE-ICASSP, pp.11066-11070, 2024 -
APSIPAJinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda. โTwo-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions.โ Proc. APSIPA ASC, 2024. -
APSIPAXingfeng Li, Xiaohan Shi, Yuke Si, Qian Chen, Yang Liu, Masashi Unoki, Masato Akagi. โBEES: A New Acoustic Task for Blended Emotion Estimation in Speech.โ Proc. APSIPA ASC, 2024. -
MRACJingguang Tian*, Desheng Hu*, Xiaohan Shi, Jiajun He, Xingfeng Li, Yuan Gao, Tomoki Toda, Xinkang Xu, Xinhui Hu. โSemi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction.โ In Proc. MRAC, pp.67โ73, 2023.
Under Review:
-
Xingfeng Li, Xiaohan Shi, Junjie Li, Yongwei Li, Masashi Unoki, Tomoki Toda, Masato Akagi. โEM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning.โ Submitted to IEEE Transactions on Affective computing. (Major Revision)
-
Xiaohan Shi, Xingfeng Li, Jinyi Mi, Tomoki Toda. โAdvancing the โThree Wsโ Concept for Speaker-Aware Emotion Recognition in Conversation.โ Submitted to APSIPA Transactions on Signal and Information Processing.
-
Yupei Guo, Jiajun He, Xiaohan Shi, Tomoki Toda, Zekun Yang, Bowen Wang, Yukinobu Taniguchi. โLeveraging LLM-Generated Explanations for Detecting Emotionally Rewritten Fake News.โ Submitted to SMC 2026.
-
Xiaohan Shi, Xingfeng Li, Tomoki Toda. โExploiting Modality-Specific Label Variations for Enhanced Multimodal Emotion Recognition.โ Submitted to INTERSPEECH 2026.
-
Xiaohan Shi, Tomoki Toda. โWยณEDM: 3W-based Emotional Modeling with Emotion Descriptions for Conversational Speech Synthesis.โ Submitted to ACM MM 2026
Waiting for Submission:
-
Xingfeng Li, Xiaohan Shi, Masashi Unoki, Tomoki Toda, Masato Akagi. โThe Contribution of Perceptual Semantic Primitives to Self-Supervised Learning Representations for Bilingual Speech Emotion Recognition.โ
-
Yongwei Li, Xiaohan Shi, Xingfeng Li, Yang Liu, Zhen Zhao, Jianhua Tao, Aijun Li, Donna Erickson, Tomoki Toda, Masto Akagi, Feng Du. โComparative Analysis of Handcrafted Acoustic Features and Pre-Trained Embeddings for Robust Speech Emotion Recognition Under Varying Noise Conditions.โ
๐ค Invited Talk
- Tomoki Toda, Xiaohan Shi. โRecent advances in speech information processing focusing on speech expression.โ Acoustical Society of Japan, Mar, 2026.
๐ Competitions
-
MER 2023: Multi-label Learning, Semi-Supervised Learning, and Modality Robustness.
Track 1: 7/23; Track 2: 8/23; Track 3: 4/15. -
Odyssey 2024: Speech Emotion Recognition Challenge.
Track 1: 11/17; Track 2: 5/15. -
Interspeech 2025: Speech Emotion Recognition in Naturalistic Conditions Challenge.
Track 1: 6/43.
๐ Honors and Awards
Fellowship:
- 2019.10 - 2020.10, ็ณๅท็็ง่ฒปๅคๅฝไบบ็ๅญฆ็ๅฅจๅญฆ้.
- 2020.04 - 2020.10, ๅ้ธๅ ็ซฏ็งๅญฆๆ่กๅคงๅญฆ้ขๅคงๅญฆ ็ ็ฉถๅฉ็ RA
- 2022.10 - 2025.10, ๆฑๆตทๅฝ็ซๅคงๅญฆๆฉๆง่ๅใใญใณใใฃใขๆฌกไธไปฃ็ ็ฉถไบๆฅญ(่ๅใใญใณใใฃใขๆฌกไธไปฃใชใตใผใใฃใผ).
- 2024.06 - 2025.03, MNSๆฌกไธไปฃ็ ็ฉถไบๆฅญ 2024ๅนดๅบฆ ้ซ่ฒข็ฎ RESEARDENT.
- 2025.10 - 2026.10, ๅๅคๅฑๅคงๅญฆ ไบๅ่ฃๅฉๅก TA
Reviewer:
Journal๏ผ
- 2023.06 - (Now), Speech Communication.
- 2025.05 - (Now), IEEE Transactions on Audio, Speech and Language Processing.
- 2025.09 - (Now), APSIPA Transactions on Signal and Information Processing.
- 2025.10 - (Now), International Journal of Human-Computer Interaction.
- 2025.10 - (Now), Neurocomputing.
- 2025.12 - (Now), IEEE Transactions on Affective computing.
Conference๏ผ
- 2023.07 - (Now), Asia-Pacific Signal and Information Processing Association (APSIPA ASC).
- 2023.07 - (Now), IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
- 2026.01 - (Now), INTERSPEECH.
- 2026.01 - (Now), WCCI.
-
๐ฌ Visitor