Skip to content

Depression detection using audio, text and LLMs

Notifications You must be signed in to change notification settings

eejji/CIEDep-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

"Hierarchical Modeling of Human Language Processing with Large Language Model for Multimodal Depression Detection (Coming soon)"

Author : Jihun Lee

Abstract

Depression severely impacts quality of life, making accurate early diagnosis crucial for mitigating long-term consequences. Previous studies have leveraged audio and text but typically treat them in isolation, overlooking the cognition-interpretation-expression processes underlying conversation. We present the Cognition-Interpretation-Expression Depression Network (CIEDep-Net), a multimodal framework that functionally models this three-stage structure. Each stage analyzes inputs with features tailored to its role, and a strategy combining score-conditioned fusion and cross-attention captures inter-stage interactions. In the interpretation stage, a large language model equipped with Chain-of-Thought prompting and a self-consistency strategy is used to generate depression scores and an inner summary from the interview transcript. The generated outputs show statistically significant agreement with the ground-truth data (Pearson correlation coefficient, r = 0.69, p < 0.01, BERTScore = 0.8), supporting the predictive validity of the proposed approach. CIEDep-Net achieved MAE 1.98, CCC 0.884 on DAIC-WOZ, and MAE 2.72, CCC 0.78 on E-DAIC, a 4.56% MAE reduction over the strongest prior multimodal baseline. Ablations confirm that removing any stage degrades performance, underscoring complementary contributions of cognition, interpretation, and expression. By embedding human language processing mechanisms into multimodal learning, CIEDep-Net delivers reliable, consistent depression-severity prediction across datasets. The approach suggests a pathway toward clinically meaningful and scalable assessment through the integrating of linguistic and paralinguistic cues.

About

Depression detection using audio, text and LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published