Back to Home

Multimodal Voice Modeling Using Acoustic and High-Speed Video Signals with Machine Learning

Machine Learning Feature Extraction Biomechanics Signal Processing Video Analysis Statistical Modeling Multimodal Data Integration

Project Overview

Built from the bench up, this program begins with physical vocal-fold models and controlled airflow, collecting direct intraglottal and subglottal pressures, acoustic recordings, and high-speed video under reproducible conditions. On top of these grounded measurements, we have developed physics-informed models—contact mechanics for collision pressure and finite-element inversion for tissue properties—to generate interpretable, pathology-relevant features such as spatiotemporal opening and closure patterns, symmetry and energy measures, and cumulative collision-pressure "dose." These features are then fed into machine learning models for classification, enabling differentiation of common lesions (for example, nodules, polyps, and posterior glottal insufficiency). The result is a coherent chain—physical modeling → multimodal measurement → physics-informed features → machine-learning classification—that supports diagnosis, therapy planning, and surgical decision-making.

Related Publications

Temporal intraglottal pressure variations at four positions in a vocal fold model oscillating at 150 Hz, shown with synchronized high-speed imaging.

Designed setup for extracting voice signals and high-speed vocal fold videos

Designed setup for extracting voice signals and high-speed vocal fold videos.

Ambulatory voice monitoring using neck surface acceleration

Ambulatory voice monitoring using neck surface acceleration.