Robust Speaker Personalisation Using Generalized Low-Rank Adaptation for Automatic Speech Recognition

[Conference] International Conference on Acoustics, Speech, and Signal Processing ( ICASSP), April 2024

Authors:

Arun Baby, George Joseph, Shatrughan Singh

Abstract:

For voice assistant systems, personalizing automated speech recognition (ASR) to a customer is the proverbial holy grail. Careful selection of hyper-parameters will be necessary for fine-tuning a larger ASR model with little speaker data. It is demonstrated that low-rank adaptation (LoRA) is a useful method for optimizing large language models (LLMs). We adapt the ASR model to specific speakers while lowering computational complexity and memory requirements by utilizing low-rank adaptation. In this work, generalized LoRA is used to refine the state-of-the-art cascaded conformer transducer model. To obtain the speaker-specific model, a small number of weights are added to the existing model and finetuned. Improved ASR accuracy across many speakers is observed in experimental assessments, while efficiency is maintained. Using the proposed method, an average relative improvement of 20% in word error rate is obtained across speakers with limited data.

Cite:

@INPROCEEDINGS{10446630,
  author={Baby, Arun and Joseph, George and Singh, Shatrughan},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Robust Speaker Personalisation Using Generalized Low-Rank Adaptation for Automatic Speech Recognition}, 
  year={2024},
  volume={},
  number={},
  pages={11381-11385},
  keywords={Training;Adaptation models;Transducers;Error analysis;Computational modeling;Memory management;Personal voice assistants;low-rank adaptation;automatic speech recognition;parameter efficient fine-tuning;speaker personalisation},
  doi={10.1109/ICASSP48485.2024.10446630}}

ICASSP

Code:

NA