Skip to content

SimoneCff/Hyb-DysNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hyb-DysNet — Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients

Authors: Simone Cioffi, Emanuel Di Nardo, Angelo Ciaramella Affiliation: Department of Science and Technology, University of Naples Parthenope, Naples, Italy Venue: AIPHEA2026 — AI in Predictive HEAlth: architectures for prevention, IJCNN, June 22, 2026, Maastricht, NL


Overview

This repository contains the implementation of Hyb-DysNet, a framework for automatic dysarthria severity classification in ALS patients, developed for SAND Challenge Task 1 (Speech-based Assessment of Neurological Disorders).

Task 1 is a 5-class classification problem: given 8 audio recordings from a patient at an initial clinical visit, classify the severity of dysarthria according to the ALSFRS-R speech subscore.

Hyb-DysNet achieves Macro F1 = 0.69 on the official held-out test set.


Dataset

Voice recordings were acquired at the ALS Centre of the Federico II University Hospital of Naples using the Vox4Health mobile application, during routine outpatient clinical visits.

Split Subjects Audio Files (8 per subject)
Training 219 1,752
Validation 53 424
Test 67 536

Severity classes (ALSFRS-R speech subscore):

Label Class ALSFRS-R
1 Severe ≤ 1
2 Moderate 2
3 Mild 3
4 No Dysarthria 4
5 Healthy 5

The dataset is severely imbalanced: Class 1 (Severe) contains only ~6 subjects in total.

Vocal tasks (8 per subject):

  • Sustained phonation of the 5 Italian vowels: /a/, /e/, /i/, /o/, /u/
  • Diadochokinetic (DDK) sequences: /pa/, /ta/, /ka/

Method

1. Audio Preprocessing

  • Resampled to 16 kHz
  • Padded or truncated to a fixed duration of 5 seconds
  • Missing recordings imputed with zeros

2. Hybrid Feature Extraction (3 parallel streams)

Stream Tool / Model Features
eGeMAPSv02 OpenSMILE 88 features
Deep embeddings Wav2Vec2-XLS-R (mean pooling) 1,024 features
Spectral descriptors Librosa (MFCC, ZCR, SC, etc.) ~90 features
Total Concatenated hybrid vector >1,200

eGeMAPSv02 captures clinically interpretable parameters: pitch, jitter, shimmer, formant frequencies (F1/F2/F3), HNR, loudness, Alpha Ratio, Hammarberg Index. Wav2Vec2-XLS-R is used as a frozen feature extractor; the last transformer layer is mean-pooled over the time dimension.

3. Preprocessing Pipeline

  1. SMOTE (training set only) — oversamples minority classes until all 5 are equally represented → 3,440 balanced training samples
  2. Z-score standardization — StandardScaler fitted on training, applied to validation/test
  3. Feature selection — SelectKBest (ANOVA F-value, k=200)

No information from the validation or test partition is used at any stage of training, scaling, or feature selection.

4. Soft-Voting Ensemble

Three heterogeneous classifiers combined via weighted soft voting (weights: XGBoost×2, LightGBM×2, LR×1):

Classifier Key Hyperparameters
XGBoost n_estimators=2000, lr=0.01, max_depth=6, subsample=0.8, GPU
LightGBM n_estimators=2000, lr=0.01, num_leaves=31, class_weight=balanced, GPU
Logistic Regression C=0.1, class_weight=balanced

5. Subject-Level Inference

Predictions are made independently on each of the 8 audio files per subject. The final subject-level severity class is obtained by majority voting over the 8 per-file predictions, aggregating evidence across vowel phonation and DDK tasks.


Results

Validation Set (file level)

Class Precision Recall F1-score Support
Severe 0.33 0.06 0.11 16
Moderate 0.39 0.44 0.41 32
Mild 0.40 0.30 0.35 96
No Dysarthria 0.35 0.30 0.33 112
Healthy 0.57 0.73 0.64 168
Macro avg 0.41 0.37 0.37 424

Accuracy: 47.41%Macro F1: 0.3656

Test Set (subject level, majority voting)

Team Macro F1
Hyb-DysNet (Ours) 0.6900
TUKE (Technical University of Kosice) 0.6079
UTL (University of Texas at Austin) 0.6005
PRIME (Université de Moncton) 0.5945

Repository Structure

.
├── notebook/
│   └── sand-challenge-task-1-submission.ipynb   # Full pipeline notebook
├── models/
│   ├── ensemble_opensmile.joblib                # Trained VotingClassifier (XGB + LGB + LR)
│   ├── scaler_opensmile.joblib                  # StandardScaler (OpenSMILE-only version)
│   ├── scaler_hybrid.joblib                     # StandardScaler (hybrid pipeline)
│   └── selector_hybrid.joblib                   # SelectKBest (k=200)
├── submissions/
│   └── submission_task1.csv                     # Test set predictions (67 subjects, classes 1–5)
└── README.md

Evaluation Metric

The primary metric is Macro F1-Score, which computes the unweighted average of per-class F1-scores and is equally sensitive to minority and majority classes — critical given the severe class imbalance.


Citation

@inproceedings{cioffi2026hybdysnet,
  title     = {Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients},
  author    = {Cioffi, Simone and Di Nardo, Emanuel and Ciaramella, Angelo},
  booktitle = {AIPHEA2026: AI in Predictive HEAlth: architectures for prevention, IJCNN},
  year      = {2026},
  address   = {Maastricht, NL}
}

About

Hyb-DysNet: hybrid feature fusion (eGeMAPSv02 + Wav2Vec2-XLS-R + Librosa) and soft-voting ensemble for dysarthria severity classification in ALS patients.

Topics

Resources

License

Stars

Watchers

Forks

Contributors