Over the recent years, i-vector approach has emerged to be the state-of-the-art for speaker recognition task. Many popular speech recognition libraries such as Kaldi offer api for training i-vector extractors, see here for a comprehensive overview of available open source tools. To avoid reinventing the wheel, I decided to use one of the existing libraries. Although I admire the quality of Kaldi, I'd prefer a pythonic API to work with since most of the RedHen libraries are wrapped in python. Luckily, I found bob.kaldi, which is a bob package that seamlessly integrate Kaldi functionality with Python-based workflows.
To install bob.kaldi, one needed to first install mini conda from here, then follow the bob installation instruction here. Finally bob.kaldi can be installed:
conda install bob.kaldi
To activate the virtual environment for this package on my Case HPC account, run the following command:
source activate bob_py3
I have written a python script build_model.py to train an i-vector extractor with functions provided in bob.kadi, a usage case looks like the following:
python build_model.py -d ubm_data.json
Note that the function bob.kaldi.ivector_train() can accept features for multiple utterances as a 3D array, I found that weird, since two utterances may have different durations, therefore different dimensions. Padding them to the same length does seem to be an elegant solution to me, so I simply changed if feats.ndim == 3: to if feats is list: in the source code, now one can put features in a list.
No comments:
Post a Comment