Automatic Speech Recognition from Throat Microphone Signals of the NATO Phonetic Alphabet Spoken by Turkish Speakers: Evaluation of Classical and Deep Techniques under Push-to-Talk Conditions

Authors

Keywords:

Automatic Speech Recognition, Machine Learning, NATO phonetic alphabet, Push-to-Talk, Throat microphone

Abstract

This study evaluates the performance of various Automatic Speech Recognition (ASR) techniques applied exclusively to recordings captured using throat microphones. The objective is to explore their applicability in Push-to-Talk (PTT) operational conditions, where traditional air microphones are limited by environmental noise. A corpus was constructed with 10 native Turkish speakers enunciating the NATO phonetic alphabet. Signals were segmented using Silero VAD, resampled to 16 kHz, and augmented to robustify the models against noise and variations. Two feature extraction approaches were employed: MFCCs and Wav2Vec2 embeddings reduced by PCA. Subsequently, five supervised classifiers were trained and compared: SVM, Random Forest, KNN, MLP, and LightGBM. Evaluation metrics included overall accuracy and Word Error Rate (WER). Results demonstrate the technical feasibility of ASR with laryngeal signals, identifying the combination of LightGBM with MFCCs as the most robust (86.38% accuracy, 0.000 WER) and confirming the potential of Random Forest with MFCCs (84.62% accuracy, 0.000 WER). This work establishes an experimental foundation for the development of robust and low-cost ASR systems in noisy environments, where throat microphones offer a crucial alternative.

Published

2025-11-13