It is usually observed that when the number of speakers enrolled in a system increase, the recognition accuracy will decrease. Therefore, although we aim to build a large-scale speaker recognition system, we can further improve the performance of our system if we know beforehand what speakers may appear in the video file, and narrow the range of potential speakers for the system to decide. This is of course not possible in general. However, thanks to the tpt files archived at RedHen, we can acquire a list of speakers that appear in CNN news video by looking at which speakers are tagged in the tpt file. Hence, this week I decided to take advantage of this additional information to optimize the current speaker recognition system. In order to do that, our system must be able to flexibly change the number of enrolled speakers. To this end, I made the following change to the speaker recognition Python module:
- Instead saving all enrolled speakers into one model, it now saves each speaker GMM model individually.
- During testing time, a list of relevant speakers will be used to build a recognizer with only these speakers.
- New functions for adding and deleting speakers are introduced.
- One can now enroll speaker by either features or saved models.
Below are examples for how to use the updated API:
To instantiate a recognizer:
import AudioPipe.speaker.recognition as SR
Recognizer = SR.GMMRec()
To enroll a speaker:
Recognizer.enroll_file(spk_name, audio_file, model=model_fn)
, where model_fn is a name of the file to save the trained model
After enrolled enough speakers, run the following command to train and save all models:
Recognizer.train()
Later during testing time, use the following code to load a recognizer with a list of speakers:
for spk_name in spk_ls:
model_fn=get_model_file(spk_name)
Recognizer.enroll_model(spk_name, model_fn)
Here get_model_file() is a pseudo function used to represent the procedure to get the path to the model file for the corresponding speaker.
No comments:
Post a Comment