Hyb-DysNet — Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients

Authors: Simone Cioffi, Emanuel Di Nardo, Angelo Ciaramella Affiliation: Department of Science and Technology, University of Naples Parthenope, Naples, Italy Venue: AIPHEA2026 — AI in Predictive HEAlth: architectures for prevention, IJCNN, June 22, 2026, Maastricht, NL

Overview

This repository contains the implementation of Hyb-DysNet, a framework for automatic dysarthria severity classification in ALS patients, developed for SAND Challenge Task 1 (Speech-based Assessment of Neurological Disorders).

Task 1 is a 5-class classification problem: given 8 audio recordings from a patient at an initial clinical visit, classify the severity of dysarthria according to the ALSFRS-R speech subscore.

Hyb-DysNet achieves Macro F1 = 0.69 on the official held-out test set.

Dataset

Voice recordings were acquired at the ALS Centre of the Federico II University Hospital of Naples using the Vox4Health mobile application, during routine outpatient clinical visits.

Split	Subjects	Audio Files (8 per subject)
Training	219	1,752
Validation	53	424
Test	67	536

Severity classes (ALSFRS-R speech subscore):

Label	Class	ALSFRS-R
1	Severe	≤ 1
2	Moderate	2
3	Mild	3
4	No Dysarthria	4
5	Healthy	5

The dataset is severely imbalanced: Class 1 (Severe) contains only ~6 subjects in total.

Vocal tasks (8 per subject):

Sustained phonation of the 5 Italian vowels: /a/, /e/, /i/, /o/, /u/
Diadochokinetic (DDK) sequences: /pa/, /ta/, /ka/

Method

1. Audio Preprocessing

Resampled to 16 kHz
Padded or truncated to a fixed duration of 5 seconds
Missing recordings imputed with zeros

2. Hybrid Feature Extraction (3 parallel streams)

Stream	Tool / Model	Features
eGeMAPSv02	OpenSMILE	88 features
Deep embeddings	Wav2Vec2-XLS-R (mean pooling)	1,024 features
Spectral descriptors	Librosa (MFCC, ZCR, SC, etc.)	~90 features
Total	Concatenated hybrid vector	>1,200

eGeMAPSv02 captures clinically interpretable parameters: pitch, jitter, shimmer, formant frequencies (F1/F2/F3), HNR, loudness, Alpha Ratio, Hammarberg Index. Wav2Vec2-XLS-R is used as a frozen feature extractor; the last transformer layer is mean-pooled over the time dimension.

3. Preprocessing Pipeline

SMOTE (training set only) — oversamples minority classes until all 5 are equally represented → 3,440 balanced training samples
Z-score standardization — StandardScaler fitted on training, applied to validation/test
Feature selection — SelectKBest (ANOVA F-value, k=200)

No information from the validation or test partition is used at any stage of training, scaling, or feature selection.

4. Soft-Voting Ensemble

Three heterogeneous classifiers combined via weighted soft voting (weights: XGBoost×2, LightGBM×2, LR×1):

Classifier	Key Hyperparameters
XGBoost	n_estimators=2000, lr=0.01, max_depth=6, subsample=0.8, GPU
LightGBM	n_estimators=2000, lr=0.01, num_leaves=31, class_weight=balanced, GPU
Logistic Regression	C=0.1, class_weight=balanced

5. Subject-Level Inference

Predictions are made independently on each of the 8 audio files per subject. The final subject-level severity class is obtained by majority voting over the 8 per-file predictions, aggregating evidence across vowel phonation and DDK tasks.

Results

Validation Set (file level)

Class	Precision	Recall	F1-score	Support
Severe	0.33	0.06	0.11	16
Moderate	0.39	0.44	0.41	32
Mild	0.40	0.30	0.35	96
No Dysarthria	0.35	0.30	0.33	112
Healthy	0.57	0.73	0.64	168
Macro avg	0.41	0.37	0.37	424

Accuracy: 47.41% — Macro F1: 0.3656

Test Set (subject level, majority voting)

Team	Macro F1
Hyb-DysNet (Ours)	0.6900
TUKE (Technical University of Kosice)	0.6079
UTL (University of Texas at Austin)	0.6005
PRIME (Université de Moncton)	0.5945

Repository Structure

.
├── notebook/
│   └── sand-challenge-task-1-submission.ipynb   # Full pipeline notebook
├── models/
│   ├── ensemble_opensmile.joblib                # Trained VotingClassifier (XGB + LGB + LR)
│   ├── scaler_opensmile.joblib                  # StandardScaler (OpenSMILE-only version)
│   ├── scaler_hybrid.joblib                     # StandardScaler (hybrid pipeline)
│   └── selector_hybrid.joblib                   # SelectKBest (k=200)
├── submissions/
│   └── submission_task1.csv                     # Test set predictions (67 subjects, classes 1–5)
└── README.md

Evaluation Metric

The primary metric is Macro F1-Score, which computes the unweighted average of per-class F1-scores and is equally sensitive to minority and majority classes — critical given the severe class imbalance.

Citation

@inproceedings{cioffi2026hybdysnet,
  title     = {Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients},
  author    = {Cioffi, Simone and Di Nardo, Emanuel and Ciaramella, Angelo},
  booktitle = {AIPHEA2026: AI in Predictive HEAlth: architectures for prevention, IJCNN},
  year      = {2026},
  address   = {Maastricht, NL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
models		models
notebook		notebook
submissions		submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyb-DysNet — Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients

Overview

Dataset

Method

1. Audio Preprocessing

2. Hybrid Feature Extraction (3 parallel streams)

3. Preprocessing Pipeline

4. Soft-Voting Ensemble

5. Subject-Level Inference

Results

Validation Set (file level)

Test Set (subject level, majority voting)

Repository Structure

Evaluation Metric

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hyb-DysNet — Hybrid Feature Fusion and Ensemble Learning for Dysarthria Severity Classification in ALS Patients

Overview

Dataset

Method

1. Audio Preprocessing

2. Hybrid Feature Extraction (3 parallel streams)

3. Preprocessing Pipeline

4. Soft-Voting Ensemble

5. Subject-Level Inference

Results

Validation Set (file level)

Test Set (subject level, majority voting)

Repository Structure

Evaluation Metric

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages