On 9 July 2025, I defended my PhD thesis in Electronics at Badji Mokhtar – Annaba University, titled: “Development of a Hybrid Multimodal Biometric System.” The work is a result of three years of research into a question that sits at the intersection of signal processing, deep learning, multimodal fusion, and security: can combining two fundamentally different biometric signals (one physiological, one behavioural) produce an identification system that is more accurate, more robust, and harder to spoof/trick than either signal alone?

The answer, as the thesis demonstrates, is yes. This post outlines the research, the technical decisions behind it, and what the defense experience meant to me.

The official announcement poster of my PhD defense.

For those interested, you can find the full thesis here and the defense presentation here.

Why Multimodal Biometrics?

Traditional biometric systems rely on a single modality: a fingerprint, a face, a voice. Each modality has known weaknesses: fingerprints can be lifted, faces can be spoofed with photographs, and voice recordings can be replayed. Unimodal systems are also inherently limited in accuracy when dealing with noisy conditions or intra-subject variability.

Multimodal biometrics addresses this by fusing multiple sources of identity evidence. The central hypothesis of my thesis was that ECG and voice, despite being heterogeneous in nature, are complementary: ECG is a hidden physiological trait that is nearly impossible to forge covertly, while voice is an accessible behavioural trait that requires no specialised hardware to acquire. Combining them at the score level, rather than at the feature or sensor level, allows each modality to be processed independently through algorithms best suited to its characteristics, before their outputs are merged into a single decision.

The ECG-Based Identification System

ECG signals carry individual-specific information in their cardiac morphology, but they are also susceptible to noise, baseline wander, and recording artifacts. The preprocessing pipeline therefore began with a 4th-order Butterworth bandpass filter (1–40 Hz) to denoise the raw signals while preserving the frequency components relevant to biometric identification.

R-peak detection was performed using the Pan-Tompkins algorithm, which provided reliable heartbeat localisation across subjects and recording conditions. The signals were then decomposed using Empirical Mode Decomposition (EMD), a data-driven technique that breaks a signal into Intrinsic Mode Functions (IMFs) ordered by frequency content. I retained only the first two IMFs, which capture the high-frequency morphological features most distinctive to individual cardiac patterns, and used the detected R-peaks as anchors to extract fixed-length segments — ensuring each input to the classifier was both standardised and physiologically meaningful.

For classification, I employed Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) networks, both well-suited to learning temporal dependencies in sequential physiological data. The approach yielded strong identification performance, demonstrating that targeted feature selection can be more effective than processing the full signal.

The Voice-Based Identification System

Voice introduces a different set of challenges. Unlike ECG, it is easily recorded at a distance and susceptible to replay attacks and environmental noise. The preprocessing stage therefore included a silence removal step to discard non-informative segments, followed by fixed-window segmentation to standardise inputs across speakers and recording conditions.

Feature extraction centred on Mel-Frequency Cepstral Coefficients (MFCCs), which is a compact representation of the spectral envelope of speech that closely reflects human auditory perception, augmented with their first and second derivatives (delta and delta-delta coefficients). This combination captures not only the static spectral shape of a speaker’s voice, but also its short-term dynamics, providing the classifier with richer discriminative information.

A Convolutional Neural Network (CNN) was used for classification, exploiting its ability to extract hierarchical spatial features from the MFCC representations.

Combining the Two Systems

The fusion stage is where the thesis makes its primary contribution. Rather than attempting to combine raw ECG and voice features, a technically complex and often counterproductive approach given the heterogeneity of the two signals, I adopted score-level fusion, in which each subsystem produces an independent identification score that is then combined according to a fusion rule.

I evaluated both Support Vector Machine (SVM) and Softmax classifiers for the fusion decision, and compared three combination rules: sum, max, and product. Across all experimental configurations, the multimodal system consistently outperformed both unimodal baselines in identification accuracy. Critically, it also offered a stronger security guarantee: an attacker would need to simultaneously spoof both an individual’s cardiac signal and their voice, a considerably higher bar than forging either in isolation.

The Defense

I entered the defense with confidence in the work. Three years of iterative development, peer-reviewed publications, and experimental evaluation had given me a solid command of both the technical details and the broader narrative of the research. The jury’s questions were pointed and substantive, pushing me to articulate not just what the system does, but why specific design decisions were made and what their implications are.

The jury was composed of:

Name and surname	Institution	Role
Prof. Sofiane GHERBI	Badji Mokhtar – Annaba University	President
Dr. Toufik HAFS	Badji Mokhtar – Annaba University	Supervisor
Dr. Sara DAAS	Badji Mokhtar – Annaba University	Co-supervisor
Dr. Narima ZERMI	Badji Mokhtar – Annaba University	Examiner
Prof. Salim OUCHTATI	20 August 1955 University – Skikda	Examiner
Prof. Hocine BOUROUBA	08 May 1945 University – Guelma	Examiner

At the conclusion of the defense, the committee awarded the highest possible distinction: “Very Honorable, with Committee Praise.” I am grateful to my supervisors, Dr. Hafs and Dr. Daas, for their guidance throughout, and to the jury for their constructive engagement with the work.

Category: Events, Graduation

Defending My PhD: A Hybrid Multimodal Biometric System

Why Multimodal Biometrics?

The ECG-Based Identification System

The Voice-Based Identification System

Combining the Two Systems

The Defense

Leave a Reply Cancel reply