RedHenSpeakRecog: Week 8: Upgrading Recognition System with Universal Background Models

A Universal Background Model (UBM) is a model used in a biometric verification system to represent general, person independent feature characteristics to be compared against a model of person-specific feature characteristics when making an accept or reject decision. For example, in a speaker verification system, the UBM is a speaker-independent Gaussian Mixture Model (GMM) trained with speech samples from a large set of speakers to represent general speech characteristics. Using a speaker-specific GMM trained with speech samples from a particular enrolled speaker, a likelihood-ratio test for an unknown speech sample can be formed between the match score of the speaker-specific model and the UBM. The UBM may also be used when training the speaker-specific model by acting as a the prior model in MAP parameter estimation. More about UBM can be found here.

State-of-the-art Gaussian mixture model (GMM)-based speaker recognition/verification systems utilize UBM, and the currently very popular total variability (i-vector) approach needs a trained UBM as a prerequisite. Hence, to improve our current speaker recognition system, we will also equip it with a UBM.

As we already have a GMM-based speaker recognition system, all we have to do is to collect a set of speech samples from distinct speaker that can cover a large varieties of speech characteristics. One can of course take the entire dataset to train the model, but since some of the speakers have much more speech samples than others, they could easily dominate the resulting UBM. Hence I limited the number of speech samples per speaker to include in the UBM training data. For this, I wrote a python script called ubm_data.py, which has the following usage case:

python ubm_data.py -s stats.json -o ubm_data.json -d ./output/audio

The output file ubm_data.json is a list of files in ./output/audio directory that will be used to train the UBM. The training can be done with the existing speaker recognition module. However, since UBM usually consist of much more components than a GMM of an individual speakers, we would need an additional handle in the speaker recognition api to specify larger number of Gaussian components, for this I added an additional parameter gmm_order in the train() function, which can be called as follows:

Recognizer = SR.GMMRec()

Recognizer.enroll_file(spk_name, audio_file)

Recognizer.train(gmm_order=256)

RedHenSpeakRecog

Saturday, July 29, 2017

Week 8: Upgrading Recognition System with Universal Background Models

No comments:

Post a Comment