
Lei Xu, Emeritus Professor of Computer Science and Engineering, Chinese University of Hong Kong (CUHK); Zhiyuan Chair Professor of Computer Science and Engineering Department, Chief Scientist of AI Research Institute, Chief Scientist of SJTU-Sensetime Research Institute; Chief Scientist of Brain Sci & Tech Research Centre, Shanghai Jiao Tong University (SJTU); Director of Neural Computation Research Centre in Brain and Intelligence Science-Technology Institute, Zhang Jiang National Lab.
Elected to Fellow of IEEE in 2001; Fellow of intl. Association for Pattern Recognition in 2002 and of European Academy of Sciences (EURASC) in 2003. Received several national and international academic awards, e.g., including 1993 National Nature Science Award, 1995 Leadership Award from International Neural Networks Society (INNS) and 2006 APNNA Outstanding Achievement Award. Conducted research in several areas of Artificial Intelligence over 40 years. Published about 400 papers (including 140+ Journal papers,also published 4 papers on NIPS during 1992-95 and the one in 1992 with Peking university marks the first time a Chinese academic institution entered this topmost AI conference). His influential contributions on Randomized Hough Transform (RHT), RPCL learning, LMSER learning, classifier combination and mixture of experts, and BYY harmony learned are well known and widely followed. Served as EIC and associate editors of several academic journals.

Professor Haizhou LI is a Fellow of the Singapore Academy of Engineering, a Fellow of the IEEE, and a Fellow of International Speech Communication Association. He is currently the Dean of the School of Artificial Intelligence and Presidential Chair Professor at The Chinese University of Hong Kong, Shenzhen. He also serves as Adjunct Professor at the National University of Singapore and U Bremen Excellence Chair Professor at the University of Bremen, Germany.
Professor LI has made outstanding contributions to speech recognition and natural language processing. He has led the development of multiple major technology deployments, including the voiceprint recognition engine for Lenovo’s A586 smartphone in 2012, and the music search engine for Baidu Music in 2013. He was the Editor-in-Chief of IEEE/ACM Transactions on Audio, Speech and Language Processing, the President of International Speech Communication Association, and a Vice President of IEEE Signal Processing Society.
Humans have a remarkable ability to pay their auditory attention only to a sound source of interest, that we call selective auditory attention, in a multi-talker environment or a Cocktail Party. As discovered in neuroscience and psychoacoustics, the auditory attention is achieved by a modulation of top-down and bottom-up attention. However, signal processing approach to speech separation and/or speaker extraction from multi-talker speech remains a challenge for machines. In this talk, we study the deep learning solutions to monaural speech separation and speaker extraction that enable selective auditory attention. We review the findings from human audio-visual speech perception to motivate the design of speech perception algorithms. We will also discuss the computational auditory models, technical challenges and the recent advances in the field.

Massimo Tornatore (Fellow, IEEE) is currently a Full Professor in the Department of Electronics, Information, and Bioengineering, Politecnico di Milano. He has also held appointments as Adjunct Professor at University of California, Davis, USA and
as Visiting Professor at University of Waterloo, Canada.
His research interests include performance evaluation, optimization and design of communication networks (with an emphasis on the application of optical networking technologies), network virtualization, network reliability, and machine learning application for network management. In these areas, he co-authored more than 500 peer-reviewed conference and journal papers (with 23 best paper awards), 3 books, and 4 patents.
He is a member of the Editorial Board, among others, of IEEE Communication Surveys and Tutorials, IEEE Transactions on Networking, IEEE Transactions on Network and Service Management.

Bin Hu is a (Full) Professor and the Dean of the School of Medical Technology at Beijing Institute of Technology, China. He is a National Distinguished Expert, Chief Scientist of 973 as well as National Advanced Worker in 2020. He is a Fellow of IEEE/IET/AAIA and IET Fellow Assessor & Fellowship Advisor. He serves as the Editor-in-Chief for the IEEE Transactions on Computational Social Systems and an Associate Editor for IEEE Transactions on Affective Computing. He is one of Clarivate Highly Cited Researchers, World's Top 2% Scientists and 0.05% Highly Ranked Scholar from ScholarGPS.
In recent years, mental health issues have become increasingly prominent all of the world. According to the report from the World Health Organization, approximately 970 million people suffer from mental disorders, accounting for 13% of the global population. Currently, the diagnosis of mental illnesses primarily relies on physician interviews and Brief Psychiatric Rating Scale (BPRS), lacking objective and quantifiable diagnostic indicators. Besides, the common treatment of mental disorders is pharmacotherapy, which is often associated with significant side effects. The rapid advancement of cutting-edge artificial intelligence and big data technologies offers new opportunities for the diagnosis and treatment of mental disorders. These technologies are shifting the approach to data driven screening and treatment, offering more precise, personalized, and effective solutions. This report will introduce the opportunities and challenges in the field of medical electronics and computational methodologies for the diagnosis and treatment of mental disorders

Li Jingjing is a Professor at the University of Electronic Science and Technology of China. His research focuses on multimodal learning and transfer learning. He has published over 80 papers in TPAMI and other CCF A-level venues with more than 10000 citations. He has won multiple national awards, including the Wu Wenjun AI Outstanding Youth Award and the ACM SIGAI China Rising Star Award.
Vision-language pretraining has enabled powerful vision-language models (VLMs) with strong zero-shot capabilities. Yet, their performance drops in domain-specific tasks, motivating research on transferring and generalizing VLM knowledge to downstream applications. This talk briefly reviews generalization settings, methodologies, and benchmarks, categorizing approaches into prompt-based, parameter-based, and feature-based methods. We also discuss our recent research on generalizing VLMs to novel domains.