Research

All

2026

Breaking through data scarcity: A novel diffusion model approach for snoring sound augmentation and classification

Tianrui Jia, Haojie Zhang, Hanhan Wu, Qiyang Sun, Xin Jing, …, Ye Zhang, Bin Hu, Tanja Schultz, Björn W. Schuller, Yoshiharu Yamamoto

Biomedical Signal Processing and Control · 01 May 2026 · doi:10.1016/j.bspc.2026.109449

Snoring can stem from various upper regions of the upper airways – the excitation location is closely linked to the unique acoustic characteristics of snore sounds, playing a vital role in sleep monitoring. The Munich-Passau Snore Sound Corpus (MPSSC) is the largest database for snoring-based auxiliary diagnosis, offering valuable data for sleep disorder research and diagnosis. However, in MPSSC, there are issues such as the small total number of samples and the uneven sample distribution. Some rare diseases have only a few case samples, failing to meet the need for sufficient learning data. To address these issues, we propose an end-to-end method for high-quality snoring audio generation for data augmentation. This method includes a Rectified-Flow-based 1D-signal diffusion model that enhances data across all classes, combined with an audio-based single diffusion model to enhance rare classes. Under our data augmentation framework, higher specificity, sensitivity, and accuracy are achieved in Automatic Snoring Sound Classification (ASSC). Also, we focus on the explicitness of classification strategies, aiming to prove the enhanced data’s high quality and applicability to downstream tasks. Our work provides comprehensive support for ASSC, enhancing sleep disorder diagnosis assistance offering new ideas for database research under scarce medical data conditions.

FedVCPL-Diff: A federated convolutional prototype learning framework with a diffusion model for speech emotion recognition

Ruobing Li, Yifan Feng, Lin Shen, Liuxian Ma, Haojie Zhang, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

Information Fusion · 01 Mar 2026 · doi:10.1016/j.inffus.2025.103745

Speech Emotion Recognition (SER), a key emotion analysis technology, has shown significant value in various research areas. Previous SER models have achieved good emotion recognition accuracy, but typical centrally-based training requires centralised processing of speech data, which has a serious risk of privacy leakage. Federated learning (FL) can avoid centralised data processing through distributed learning, providing a solution for privacy protection in SER. However, FL faces several challenges in practical applications, including imbalanced data distribution and inconsistent labelling. Furthermore, typical FL frameworks focus on client-side enhancement and ignore server-side aggregation strategy optimisation, which can increase the computational load on the client side. To address the aforementioned problems, we propose a novel approach, FedVCPL-Diff. Firstly, regarding information fusion, we introduce a diffusion model on the server side to generate Valence-Arousal-Dominance emotion space features, which replaces the typical aggregation framework and effectively promotes global information fusion. In addition, in terms of information exchange, we propose a lightweight and personalised FL transmission framework based on the exchange of VAD features. FedVCPL-Diff optimises the local model by updating the data distribution anchors, which not only avoids the privacy risk but also reduces the communication cost. Experimental results show that the framework significantly improves emotion recognition performance compared to four commonly used FL frameworks. The overall performance of our framework also shows a significant advantage compared to locally independent models.

2025

Exploring the Alleviating Effects of taVNS on Negative Emotions: An EEG Study

Xiaokun Jin, Chengcheng Zheng, Mingyue Jin, Qunxi Dong, Lixian Zhu, Fuze Tian

IEEE Transactions on Computational Social Systems · 01 Dec 2025 · doi:10.1109/TCSS.2025.3564035

Emotion inhibitory control is a key executive function of the human brain, which regulates behavior by suppressing inappropriate responses. It plays an integral part in alleviating negative emotions, improving mood, and preventing depression. Transcutaneous auricular vagus nerve stimulation (taVNS) has been proved to enhance behavioral control, potentially suppressing negative emotions or facilitating their reduction in healthy individuals. However, the neurocomputational mechanisms underlying taVNS-induced neuroenhancement remain unclear. In this work, a portable electroencephalography (EEG) acquisition and stimulation device is designed to collect eight-channel EEG signals and deliver taVNS to both sides of ears. Then, we design a protocol that successfully induced negative emotions in healthy subjects. Next, we conduct a sham-controlled experiment, involving 28 healthy subjects, to explore the changes in EEG of negative emotions under taVNS. Finally, we primarily analyze the power spectrum density (PSD) of EEG signals and the functional connectivity network of the brain, based on the phase locking value (PLV), to assess the effect of taVNS on neural activity induced by negative emotions. The results of the experiment reveal that taVNS is a promising method for enhancing emotional inhibitory control by reducing PSD in the alpha band and enhancing PLV within prefrontal inhibitory control networks. In addition, differences in graph theory parameters between the Sham and taVNS conditions indicate that taVNS helps regulate negative emotions. In conclusion, this study demonstrates that taVNS enhances inhibitory control and reveals its neurocomputational mechanisms of EEG in healthy individuals during the development of negative emotions. And results indicate that taVNS could serve as a promising neuromodulation therapy for psychiatric disorders and individuals with depression or emotional distress.

Somatisation Disorder Recognition by Stream Fusion with WavLM and Enhanced ResNet

Zhijing Cao, Lin Shen, Liuxian Ma, Xiaoxi Liu, Haojie Zhang, …, Ruolan Huang, Toru Nakamura, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

2025 IEEE 14th Global Conference on Consumer Electronics (GCCE) · 23 Sep 2025 · doi:10.1109/GCCE65946.2025.11275046

Somatisation Disorder (SD) is a serious mental health condition with the difficulty of clinical diagnosis due to the limitations of subjective assessment methods and the lack of objective biomarkers. In this work, we introduce an effective multi-stream fusion strategy and an improved ResNet model for SD recognition. The model combines the WavLM and ResNet architectures. The results show that our model achieves 36.8% accuracy on the considered quaternary classification task. Based on the proposed model, we likewise realise effective and robust recognition of SD.

An AI-Assisted All-in-One Integrated Coronary Artery Disease Diagnosis System Using a Portable Heart Sound Sensor With an On-Board Executable Lightweight Model

Haojie Zhang, Fuze Tian, Yang Tan, Lin Shen, Jingyu Liu, …, Yalei Han, Gong Su, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

IEEE Transactions on Mobile Computing · 01 Aug 2025 · doi:10.1109/TMC.2025.3547842

Heart sounds play a crucial role in assessing Coronary Artery Disease (CAD). The advancement of Artificial Intelligence (AI) technologies has given rise to Computer Audition (CA)-based methods for CAD detection. However, previous research has focused primarily on analyzing and modeling heart sound data, overlooking practical application scenarios. In this work, we design a pervasive heart sound collection device used for high-quality heart sound data acquisition. Moreover, we introduce an on-board executable lightweight network tailored for the designed portable device, referred to as TYKDModel. Further, heart sound data from 41 CAD patients and 22 non-CAD healthy controls are collected using the developed device. Experimental results show that the TYKDModel exhibits low-computational complexity, with 52.16 K parameters and 5.03 M Floating-Point Operations (FLOPs). When deployed on the board, it requires only 1.10 MB of Random Access Memory (RAM) and 236.27 KB of Read-Only Memory (ROM), and takes around 1.72 seconds to perform a classification. Despite the low computational and spatial complexity, the TYKDModel achieves a notable classification accuracy of 85.2%, specificity of 88.6%, and sensitivity of 82.8% on the board. These results indicate the promising potential of AI-assisted all-in-one integrated system for the diagnosis of heart sound-assisted CAD.

MDH-NAS: Accelerating EEG Signal Classification With Mixed-Level Differentiable and Hardware-Aware Neural Architecture Search

Lixian Zhu, Su Wang, Xiaokun Jin, Kai Zheng, Jian Zhang, Shuting Sun, Fuze Tian, Ran Cai, Bin Hu

IEEE Internet of Things Journal · 01 Jul 2025 · doi:10.1109/JIOT.2025.3553450

In noninvasive brain-computer interfaces (BCIs), EEG analysis plays a critical role, with neural networks serving as a cornerstone for signal decoding. Existing neural network approaches for EEG signal recognition require extensive manual design and hyperparameter tuning, leading to inefficiencies and making them impractical for embedded devices due to their large model size. To address these limitations, we propose mixed-level differentiable and hardware-aware neural architecture search (MDH-NAS), a framework that automatically generates lightweight neural networks tailored for EEG classification. Unlike traditional DARTS methods, MDH-NAS employs a hybrid optimization strategy that balances global and local search spaces, thereby accelerating and refining architecture discovery. It introduces explicit size constraints during the search process to ensure deployability on embedded devices. MDH-NAS demonstrates autonomous generation of architectures for tasks such as motor imagery (MI) and depression recognition, achieving 87.80% accuracy on the BCI-IV dataset and 90.09% on the MODMA dataset. When deployed on the EAIDK-610 board across heterogeneous tasks, it attains 85.37% accuracy on the EEG Motor Movement/Imagery dataset. This method reduces architecture discovery time by 89% and enhances prediction accuracy by 8.70% compared to baseline methods, highlighting its potential for scalable EEG analysis and real-world embedded deployment.

Enhancing Emotion Regulation in Mental Disorder Treatment: An AIGC-Based Closed-Loop Music Intervention System

Lin Shen, Haojie Zhang, Cuiping Zhu, Ruobing Li, Kun Qian, Fuze Tian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

IEEE Transactions on Affective Computing · 01 Jul 2025 · doi:10.1109/TAFFC.2025.3557873

Mental disorders have increased rapidly and have emerged as a serious social health issue in the recent decade. Undoubtedly, the timely treatment of mental disorders is crucial. Emotion regulation has been proven to be an effective method for treating mental disorders. Music therapy as one of the methods that can achieve emotional regulation has gained increasing attention in the field of mental disorder treatment. However, traditional music therapy methods still face some unresolved issues, such as the lack of real-time capability and the inability to form closed-loop systems. With the advancement of artificial intelligence (AI), especially AI-generated content (AIGC), AI-based music therapy holds promise in addressing these issues. In this paper, an AIGC-based closed-loop music intervention system demonstration is proposed to regulate emotions for mental disorder treatment. This system demonstration consists of an emotion recognition model and a music generation model. The emotion recognition model can assess mental states, while the music generation model generates the corresponding emotional music for regulation. The system continuously performs recognition and regulation, thus forming a closed-loop process. In the experiment, we first conduct experiments on both the emotion recognition model and the music generation model to validate the accuracy of the recognition model and the music quality generated by the music generation models. In conclusion, we conducted comprehensive tests on the entire system to verify its feasibility and effectiveness.

Machine Learning Enabled Reusable Adhesion, Entangled Network-Based Hydrogel for Long-Term, High-Fidelity EEG Recording and Attention Assessment

Kai Zheng, Chengcheng Zheng, Lixian Zhu, Bihai Yang, Xiaokun Jin, …, Jingyu Liu, Yan Xiong, Fuze Tian, Ran Cai, Bin Hu

Nano-Micro Letters · 29 May 2025 · doi:10.1007/s40820-025-01780-7

Due to their high mechanical compliance and excellent biocompatibility, conductive hydrogels exhibit significant potential for applications in flexible electronics. However, as the demand for high sensitivity, superior mechanical properties, and strong adhesion performance continues to grow, many conventional fabrication methods remain complex and costly. Herein, we propose a simple and efficient strategy to construct an entangled network hydrogel through a liquid–metal-induced cross-linking reaction, hydrogel demonstrates outstanding properties, including exceptional stretchability (1643%), high tensile strength (366.54 kPa), toughness (350.2 kJ m−3), and relatively low mechanical hysteresis. The hydrogel exhibits long-term stable reusable adhesion (104 kPa), enabling conformal and stable adhesion to human skin. This capability allows it to effectively capture high-quality epidermal electrophysiological signals with high signal-to-noise ratio (25.2 dB) and low impedance (310 ohms). Furthermore, by integrating advanced machine learning algorithms, achieving an attention classification accuracy of 91.38%, which will significantly impact fields like education, healthcare, and artificial intelligence.

Explainable Depression Classification Based on EEG Feature Selection From Audio Stimuli

Lixian Zhu, Rui Wang, Xiaokun Jin, Yuwen Li, Fuze Tian, …, Kun Qian, Xiping Hu, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

IEEE Transactions on Neural Systems and Rehabilitation Engineering · 01 Jan 2025 · doi:10.1109/TNSRE.2025.3557275

With the development of affective computing and Artificial Intelligence (AI) technologies, Electroencephalogram (EEG)-based depression detection methods have been widely proposed. However, existing studies have mostly focused on the accuracy of depression recognition, ignoring the association between features and models. Additionally, there is a lack of research on the contribution of different features to depression recognition. To this end, this study introduces an innovative approach to depression detection using EEG data, integrating Ant-Lion Optimization (ALO) and Multi-Agent Reinforcement Learning (MARL) for feature fusion analysis. The inclusion of Explainable Artificial Intelligence (XAI) methods enhances the explainability of the model’s features. The Time-Delay Embedded Hidden Markov Model (TDE-HMM) is employed to infer internal brain states during depression, triggered by audio stimulation. The ALO-MARL algorithm, combined with hyper-parameter optimization of the XGBoost classifier, achieves high accuracy (93.69%), sensitivity (88.60%), specificity (97.08%), and F1-score (91.82%) on a auditory stimulus-evoked three-channel EEG dataset. The results suggest that this approach outperforms state-of-the-art feature selection methods for depression recognition on this dataset, and XAI elucidates the critical impact of the minimum value of Power Spectral Density (PSD), Sample Entropy (SampEn), and Rényi Entropy (Ren) on depression recognition. The study also explores dynamic brain state transitions revealed by audio stimuli, providing insights for the clinical application of AI algorithms in depression recognition.

An On-Board Executable Multi-Feature Transfer-Enhanced Fusion Model for Three-Lead EEG Sensor-Assisted Depression Diagnosis

Fuze Tian, Haojie Zhang, Yang Tan, Lixian Zhu, Lin Shen, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

IEEE Journal of Biomedical and Health Informatics · 01 Jan 2025 · doi:10.1109/JBHI.2024.3487012

The development of affective computing and medical electronic technologies has led to the emergence of Artificial Intelligence (AI)-based methods for the early detection of depression. However, previous studies have often overlooked the necessity for the AI-assisted diagnosis system to be wearable and accessible in practical scenarios for depression recognition. In this work, we present an on-board executable multi-feature transfer-enhanced fusion model for our custom-designed wearable three-lead Electroencephalogram (EEG) sensor, based on EEG data collected from 73 depressed patients and 108 healthy controls. Experimental results show that the proposed model exhibits low-computational complexity (65.0 K parameters), promising Floating-Point Operations (FLOPs) performance (25.6 M), real-time processing (1.5 s/execution), and low power consumption (320.8 mW). Furthermore, it requires only 202.0 KB of Random Access Memory (RAM) and 279.6 KB of Read-Only Memory (ROM) when deployed on the EEG sensor. Despite its low computational and spatial complexity, the model achieves a notable classification accuracy of 95.2%, specificity of 94.0%, and sensitivity of 96.9% under independent test conditions. These results underscore the potential of deploying the model on the wearable three-lead EEG sensor for assisting in the diagnosis of depression.

2024

Advancements in Affective Disorder Detection: Using Multimodal Physiological Signals and Neuromorphic Computing Based on SNNs

Fuze Tian, Lixin Zhang, Lixian Zhu, Mingqi Zhao, Jingyu Liu, Qunxi Dong, Qinglin Zhao

IEEE Transactions on Computational Social Systems · 01 Dec 2024 · doi:10.1109/TCSS.2024.3420445

Currently, the integration of artificial intelligence (AI) techniques with multimodal physiological signals represents a pivotal approach to detect affective disorders (ADs). With the increasing complexity and diversity of physiological signal modalities, researchers have introduced various AI methods using multimodal physiological signals to improve model classification performance and explainability to increase trust and facilitate clinical adoption. Among these methods, spiking neural networks (SNNs) stand out as a promising avenue due to their alignment with the operating principles of the human brain, robust biological explainability, and adeptness in processing spatial–temporal information in an efficient event-driven manner with low power consumption. Furthermore, the emergence of neuromorphic computing (NC) chips based on SNNs has greatly bolstered the field of NC, enabling effective support for objective, pervasive, and wearable AI-assisted medical diagnostic devices for ADs and other diseases. This article presents a review of recent achievements in multimodal AD detection and points out the associated challenges in utilizing multimodal physiological signals and NC based on SNNs for AD detection. Building upon this foundation, we give perspectives on future work. The intended readership for this review consists of researchers in the fields of cognitive computing, computational psychophysiology, affective computing, NC, and brain-inspired computing. We hope that this survey not only garners increased attention from the scientific community but also serves as a valuable guide for future studies in this field.

A First Look at Generative Artificial Intelligence-Based Music Therapy for Mental Disorders

Lin Shen, Haojie Zhang, Cuiping Zhu, Ruobing Li, Kun Qian, Wei Meng, Fuze Tian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

IEEE Transactions on Consumer Electronics · 01 Dec 2024 · doi:10.1109/TCE.2024.3514633

Mental disorders show a rapid increase and cause considerable harm to individuals as well as the society in recent decade. Hence, mental disorders have become a serious public health challenge in nowadays society. Timely treatment of mental disorders plays a critical role for reducing the harm of mental illness to individuals and society. Music therapy is a type of non-pharmaceutical method in treating such mental disorders. However, conventional music therapy suffers from a number of issues resulting in a lack of popularity. Thanks to the rapid development of Artificial Intelligence (AI), especially the AI Generated Content (AIGC), it provides a chance to address these issues. Nevertheless, to the best of our knowledge, there is no work investigating music therapy from AIGC and closed-loop perspective. In this paper, we summarise some universal music therapy methods and discuss their shortages. Then, we indicate some AIGC techniques, especially the music generation, for their application in music therapy. Moreover, we present a closed-loop music therapy system and introduce its implementation details. Finally, we discuss some challenges in AIGC-based music therapy with proposing further research direction, and we suggest the potential of this system to become a consumer-grade product for treating mental disorders.

Design and Implementation of Electroacupuncture: A Study of Prefrontal EEG Characteristics Under taVNS

Lixian Zhu, Yanan Zhao, Xiaokun Jin, Fuze Tian, Jingxin Liu, Ran Cai, Qunxi Dong, Peijing Rong, Bin Hu

IEEE Sensors Journal · 15 Oct 2024 · doi:10.1109/JSEN.2024.3441619

Transcutaneous auricular vagus nerve stimulation (taVNS), as a method for mimicking VNS, has been proven effective in the treatment of psychiatric disorders. However, the underlying mechanism through which taVNS mimics VNS remains elusive. Moreover, the parameters of taVNS are singularly fixed and open loop in previous work, which is difficult to apply to all users as individual differences are inevitable. Since electroencephalogram (EEG) is one of the important biomarkers of neural activity, this study aims to develop a closed-loop system for personalized interventions in emotion regulation by integrating taVNS with EEG feedback. We first design a taVNS system based on EEG signal feedback and verify the performance metrics of the system. Second, we design experimental paradigms to explore the changes in EEG features under the taVNS. The experimental results show that the EEG characteristics differ between different taVNS frequencies (between 50 and 100 Hz). Moreover, we observe substantial distinctions between EEG characteristics during the taVNS state and the resting state, with pre-taVNS, taVNS, and post-taVNS exhibiting notable differences. Specifically, the power spectral density (PSD) in the taVNS state is lower than in the resting state ( p<0.05 ), except for the beta band where the opposite trend is observed. Additionally, features such as Lempel-Ziv complexity (LZC) and Reyi entropy (REn) displayed a decreasing trend throughout the taVNS ( p<0.05 ). Furthermore, we employ hidden Markov models (HMMs) to reveal the heterogeneity of dynamic changes in the brain during taVNS, providing a mechanistic interpretation of taVNS.

E-ODN: An Emotion Open Deep Network for Generalised and Adaptive Speech Emotion Recognition

Liuxian Ma, Lin Shen, Ruobing Li, Haojie Zhang, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

Interspeech 2024 · 01 Sep 2024 · doi:10.21437/Interspeech.2024-685

Recognising the widest range of emotions possible is a major challenge in the task of Speech Emotion Recognition(SER), especially for complex and mixed emotions. However, due to the limited number of emotional types and uneven distribution of data within existing datasets, current SER models are typically trained and used in a narrow range of emotional types. In this paper, we propose the Emotion Open Deep Network(E-ODN) model to address this issue. Besides, we introduce a novel Open-Set Recognition method that maps sample emotional features into a three-dimensional emotional space. The method can infer unknown emotions and initialise new type weights, enabling the model to dynamically learn and infer emerging emotional types. The empirical results show that our recognition model outperforms the state-of-the-art(SOTA) models in dealing with multi-type unbalanced data, and it can also perform finer-grained emotion recognition.

An FFT-Based DC Offset Compensation and I/Q Imbalance Correction Algorithm for Bioradar Sensors

Fuze Tian, Lixian Zhu, Qiuxia Shi, Xiaokun Jin, Ran Cai, Qunxi Dong, Qinglin Zhao, Bin Hu

IEEE Transactions on Microwave Theory and Techniques · 01 Mar 2024 · doi:10.1109/TMTT.2023.3308190

The challenge of noncontact presentation of human cardiopulmonary activity using a bioradar sensor is to linearly demodulate the Doppler cardiopulmonary diagram (DCD) signal from baseband signals. Arctangent demodulation can perform linear phase demodulation to obtain the DCD signal. However, the high-order harmonics and intermodulation terms (ITs) caused by the time-varying direct current (dc) offset and in-phase and quadrature-phase (I/Q) imbalance in the baseband signals significantly degrade the signal-to-noise ratio (SNR) of the Doppler heartbeat diagram (DHD) signal. In this work, a fast Fourier transform (FFT)-based algorithm is proposed to simultaneously perform time-varying dc offset compensation and I/Q imbalance correction without the need for an auxiliary device to improve the accuracy of the arctangent demodulation. The obtained results show that the SNRs of the algorithm-processed DHD signals are increased from 30.08 ± 2.41 to 68.88 ± 10.57 dB. In addition, the root mean square errors (RMSEs) of the C-C intervals of the DHD signals for eight subjects with respect to the J-J intervals of the ballistocardiogram (BCG) signals are 17.79 ± 2.72 ms (2.80% ± 0.43%), suggesting a promising potential of the DHD signal for noncontact biomedical applications.

2023

Design and Verification of an Aromatherapy Feedback System for Mental Fatigue Based on Physiological Signals

Tao Sun, Fuze Tian, Hua Jiang, Qinglin Zhao, Bin Hu

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) · 05 Dec 2023 · doi:10.1109/BIBM58861.2023.10385577

Mental fatigue is a prevalent issue in contemporary society and can negatively affect physical performance and concentration, increasing the likelihood of adverse consequences due to inattention during productive activities. Therefore, it becomes increasingly important to address and eliminate fatigue within a specific period of time. Aromatherapy, as a form of Complementary Alternative Medicine (CAM), is a non-invasive, cost-effective, and efficient method to combat fatigue. Previous studies have assessed the effects of specific aromatherapy oils using scales, but there is a lack of objective and reliable physiological indicators to prove the effectiveness of aromatherapy. Hence, this paper seeks to establish a model illustrating the effects of aromatic essential oil gases on the human body. A multimodal physiological fatigue signal acquisition system that integrates aromatherapy feedback was designed. In addition, an experimental paradigm was developed to explore the potential of aromatherapy in mitigating mental fatigue. Electroencephalogram (EEG) and Electrocardiogram (ECG) signals were collected, allowing for the analysis of time-frequency domain features in EEG and ECG signals, as well as Heart Rate Variability (HRV) features in ECG signals. Our findings indicate that specific aromatic gases demonstrate effectiveness in reducing mental fatigue. Furthermore, we employed the Support Vector Machine (SVM) algorithm to classify the state of human mental fatigue. Based on the classification results, the release of aromatic gas was controlled to provide targeted aromatic feedback. This innovative approach offers a promising avenue for objectively assessing and addressing mental fatigue through aromatherapy interventions.

The Three-Lead EEG Sensor: Introducing an EEG-Assisted Depression Diagnosis System Based on Ant Lion Optimization

Fuze Tian, Lixian Zhu, Qiuxia Shi, Rui Wang, Lixin Zhang, Qunxi Dong, Kun Qian, Qinglin Zhao, Bin Hu

IEEE Transactions on Biomedical Circuits and Systems · 01 Dec 2023 · doi:10.1109/TBCAS.2023.3292237

For depression diagnosis, traditional methods such as interviews and clinical scales have been widely leveraged in the past few decades, but they are subjective, time-consuming, and labor-consuming. With the development of affective computing and Artificial Intelligence (AI) technologies, Electroencephalogram (EEG)-based depression detection methods have emerged. However, previous research has virtually neglected practical application scenarios, as most studies have focused on analyzing and modeling EEG data. Furthermore, EEG data is typically obtained from specialized devices that are large, complex to operate, and poorly ubiquitous. To address these challenges, a wearable three-lead EEG sensor with flexible electrodes was developed to obtain prefrontal-lobe EEG data. Experimental measurements show that the EEG sensor achieves promising performance (background noise of no more than 0.91 μVpp, Signal-to-Noise Ratio (SNR) of 26–48 dB, and electrode-skin contact impedance of less than 1 KΩ). In addition, EEG data from 70 depressed patients and 108 healthy controls were collected using the EEG sensor, and the linear and nonlinear features were extracted. The features were then weighted and selected using the Ant Lion Optimization (ALO) algorithm to improve classification performance. The experimental results show that the k-NN classifier achieves a classification accuracy of 90.70%, specificity of 96.53%, and sensitivity of 81.79%, indicating the promising potential of the three-lead EEG sensor combined with the ALO algorithm and the k-NN classifier for EEG-assisted depression diagnosis.

Multi-Track Music Generation with WGAN-GP and Attention Mechanisms

Luyu Chen, Lin Shen, Dan Yu, Zhihua Wang, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

2023 IEEE 12th Global Conference on Consumer Electronics (GCCE) · 10 Oct 2023 · doi:10.1109/GCCE59613.2023.10315503

Music generation with artificial intelligence is a complex and captivating task. The utilisation of generative adversarial networks (GANs) has exhibited promising outcomes in producing realistic and diverse music compositions. In this paper, we propose a model based on Wasserstein GAN with gradient penalty (WGAN-GP) for multi-track music generation. This model incorporates self-attention and introduces a novel cross-attention mechanism in the generator to enhance its expressive capability. Additionally, we transpose all music to C major in training to ensure data consistency and quality. Experimental results demonstrate that our model can produce multi-track music with enhanced rhythm and sound characteristics, accelerate convergence, and improve generation quality.

Less is More: A Novel Feature Extraction Method for Heart Sound Classification via Fractal Transformation

Cuiping Zhu, Zhonghao Zhao, Yang Tan, Mengkai Sun, Kun Qian, Tao Jiang, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) · 24 Jul 2023 · doi:10.1109/EMBC40787.2023.10340710

Cardiovascular diseases (CVDs) are the leading cause of death globally. Heart sound signal analysis plays an important role in clinical detection and physical examination of CVDs. In recent years, auxiliary diagnosis technology of CVDs based on the detection of heart sound signals has become a research hotspot. The detection of abnormal heart sounds can provide important clinical information to help doctors diagnose and treat heart disease. We propose a new set of fractal features-fractal dimension (FD)-as the representation for classification and a Support Vector Machine (SVM) as the classification model. The whole process of the method includes cutting heart sounds, feature extraction, and classification of abnormal heart sounds. We compare the classification results of the heart sound waveform (time domain) and the spectrum (frequency domain) based on fractal features. Finally, according to the better classification results, we choose the fractal features that are most conducive for classification to obtain better classification performance. The features we propose outperform the widely used features significantly (p < .05 by one-tailed z-test) with a much lower dimension. Clinical relevance-The heart sound classification model based on fractal provides a new time-frequency analysis method for heart sound signals. A new effective mechanism is proposed to explore the relationship between the heart sound acoustic properties and the pathology of CVDs. As a non-invasive diagnostic method, this work could supply an idea for the preliminary screening of cardiac abnormalities through heart sounds.

Intelligent Music Intervention for Mental Disorders: Insights and Perspectives

Kun Qian, Bjorn W. Schuller, Xiaohong Guan, Bin Hu

IEEE Transactions on Computational Social Systems · 01 Feb 2023 · doi:10.1109/TCSS.2023.3235079

Welcome to the first issue of IEEE Transactions on Computational Social Systems (TCSS) of 2023. The past 2022 was again a very productive year, in which we have published 159 articles with about 1850 pages in six issues. We also received much great and exciting news.

All

2026

2025

2024

2023

Commonly used website links