Skip to main content
Research

Publications: Mr Yinghao Ma

Li C, Chen Y, Ji Y, Xu J, Cui Z, Li S, Zhang Y, Tang J et al. ( 2026 ) . OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs .
Jiang X, Wang Q, Wu J, He X, Xu Z, Ma Y, Piao M, Yang K et al. ( 2026 ) . AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking .
Ma Z, Yang G, Chen W, Gao Z, Du Y, Li X, Zheng Z, Zhu H et al. ( 2026 ) . SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing . IEEE Journal of Selected Topics in Signal Processing vol. PP , ( 99 ) 1 - 14 .
Li Y, Ma Y, Zhang G, Yuan R, Zhu K, Guo H, Liang Y, Liu J et al. ( 2025 ) . OmniBench: Towards The Future of Universal Omni-Language Models .
Ma Y, Xia H, Chen W, Taheri T, Chang S, Gao H, Yuan R, Ding M et al. ( 2025 ) . A Comprehensive Music Interaction Platform for Evaluating Music Generation Models . Conference: DMRN+20 Digital Music Research Network One-day Workshop 2025 ( King’s College London (Bush House). London, UK ) from: 16/12/2025 to: 16/12/2025 ,
Ma Y, Li Y, Benetos E, Lin C ( 2025 ) . Controlled Genre-Specific Music Generation: Fine-Tuning with Predictive Data Mixture Optimization . Conference: DMRN+20 Digital Music Research Network One-day Workshop 2025 ( King’s College London (Bush House). London, UK ) from: 16/12/2025 to: 16/12/2025 ,
Taheri T, Ma Y, Benetos E ( 2025 ) . SAR-LM: Symbolic Audio Reasoning with Large Language Models . Conference: DMRN+20 Digital Music Research Network One-day Workshop 2025 ( King’s College London (Bush House). London, UK ) from: 16/12/2025 to: 16/12/2025 ,
Tang X, Lei X, Zhu C, Chen S, Yuan R, Li Y, Oh C, Zhang G et al. ( 2025 ) . AutoMV: An Automatic Multi-Agent System for Music Video Generation .
Taheri T, Ma Y, Benetos E ( 2025 ) . SAR-LM: Symbolic Audio Reasoning with Large Language Models .
Ma Y, Li S, Yu J, Benetos E, Maezawa A ( 2025 ) . CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following . Conference: 26th International Society for Music Information Retrieval Conference (ISMIR) ( Daejeon, Korea ) from: 21/09/2025 to: 25/09/2025 ,
Yuan R, Lin H, Guo S, Zhang G, Pan J, Zang Y, Liu H, Liang Y et al. ( 2025 ) . YuE: Scaling Open Foundation Models for Long-Form Music Generation .
Ma Y, Li S, Yu J, Benetos E, Maezawa A ( 2025 ) . CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following .
Ma Z, Ma Y, Zhu Y, Yang C, Chao Y-W, Xu R, Chen W, Chen Y et al. ( 2025 ) . MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix .
Xue L, Zhou Z, Pan J, Li Z, Fan S, Ma Y, Cheng S, Yang D et al. ( 2025 ) . Audio-FLAN: A Preliminary Release .
Qu X, Bai Y, Ma Y, Zhou Z, Lo KM, Liu J, Yuan R, Min L et al. ( 2024 ) . MuPT: A Generative Symbolic Music Pretrained Transformer .
Yuan R, Lin H, Wang Y, Tian Z, Wu S, Shen T, Zhang G, Wu Y et al. ( 2024 ) . ChatMusician: Understanding and Generating Music Intrinsically with LLM . Conference: 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) ( Bangkok, Thailand ) from: 11/08/2024 to: 16/08/2024 ,
Zhuo L, Yuan R, Pan J, Ma Y, LI Y, Zhang G, Liu S, Dannenberg R et al. ( 2024 ) . LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT .
Li Y, Yuan R, Zhang G, Ma Y, Chen X, Yin H, Xiao C, Lin C et al. ( 2024 ) . MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training . Conference: International Conference on Learning Representations (ICLR) ( Vienna, Austria ) from: 07/05/2024 to: 11/05/2024 ,
Deng Q, Yang Q, Yuan R, Huang Y, Wang Y, Liu X, Tian Z, Pan J et al. ( 2024 ) . ComposerX: Multi-Agent Symbolic Music Composition with LLMs .
Li D, Ma Y, Wei W, Kong Q, Wu Y, Che M, Xia F, Benetos E et al. ( 2024 ) . Mertech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model with Multi-Task Finetuning . Conference: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) vol. 00 , 521 - 525 .
Deng Z, Ma Y, Liu Y, Guo R, Zhang G, Chen W, Huang W, Benetos E ( 2024 ) . MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response . Conference: Findings of the Association for Computational Linguistics: NAACL 20243643 - 3655 .
Yuan R, Ma Y, Li Y, Zhang G, Chen X, Yin H, Zhuo L, Liu Y et al. ( 2023 ) . MARBLE: Music Audio Representation Benchmark for Universal Evaluation .
Li D, Ma Y, Wei W, Kong Q, Wu Y, Che M, Xia F, Benetos E et al. ( 2023 ) . MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning .
Deng Z, Ma Y, Liu Y, Guo R, Zhang G, Chen W, Huang W, Benetos E ( 2023 ) . MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response .
Ma Y, Yuan R, Li Y, Zhang G, Chen X, Yin H, Lin C, Benetos E et al. ( 2023 ) . On the Effectiveness of Speech Self-supervised Learning for Music .
Miller J, Lewis D, Guo Z, Li Y, Ma Y, Vahidi C, Boon H, Wolstanholme L et al. ( 2022 ) . DMRN+17: Digital Music Research Network One-day Workshop 2022 . Conference: DMRN+17: Digital Music Research Network One-day Workshop 2022 ( Queen Mary Univeristy of London ) from: 20/12/2022 to: 20/12/2022 ,
Li Y, Yuan R, Zhang G, Ma Y, Lin C, Chen X, Ragni A, Yin H et al. ( 2022 ) . Large-Scale Pretrained Model for Self-Supervised Music Audio Representation Learning . Conference: DMRN+17: Digital Music Research Network One-day Workshop 2022 ( London, UK ) from: 20/12/2022 to: 20/12/2022 ,
Li Y, Yuan R, Zhang G, Ma Y, Lin C, Chen X, Ragni A, Yin H et al. ( 2022 ) . MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning .