RedHenSpeakRecog: Week 1: Combining Speaker Information and Forced Alignment

The tpt files on cartago contain, among many other meta information, the transcript and corresponding speakers for the news audios. The task of this week is to combine speaker info from tpt and the time info given by Gentle alignment.

Gentle expects only the spoken words for the transcript, any other information will be noise to it, hence may degrade the quality of the alignment results. Therefore, in order to run gentle, the first step is to strip off redundant information from the tpt file to get clean transcript. The program for this, written by Prof. Steen, is called strip-tpt, placed on my cartago homepage.

To finally get "who speaks when", we then need to add the speaker info from tpt to the output returned by Gentle. The challenge is to find the right place in the aligned transcript to insert corresponding speakers. The solution I came up with is to leave a special notation at the place where a speaker occurs in the tpt file during the stripping process, and to use this special notation as a reference point later to insert the corresponding speaker info. In order to NOT let this extra content in the transcript worsen the performance of gentle, the special notation has to be ignored during alignment but kept in the result. After a set of test runs, I found that double chevrons ">>" used in tpt files to mark speakers would actually do the job.

Therefore I modified strip-tpt such that instead of removing the speaker info from tpt, it replaces them with ">>". In addition, I added some commands to also strip "voice over" info that was overlooked previously. The modified file is called chevron-tpt, also on my cartago. It has been run on all the tpt files, and the processed results (denoted by extension .chevron.tpt) are saved in home/owen_he/netapp/chevron/.

To get the stripped results on Case HPC, I wrote a script get-chevron, which fetches the original tpt, chevron.tpt and video files from Cartago and place them in the folder ~/data/. Furthermore a command is added in ~/GSoC2017/gentle.sh to extract speaker info from tpt before it starts Gentle alignment. Finally, I hacked the main script of gentle alignment to also write the speaker info in the alignment output.

List of files:

Cartago:~/chevron-tpt:
replace the speaker info with >> as reference points for inserting speaker information back later,(voice over) was not stripped off in the previous version, now it’s replaced with ||

CaseHPC:~/data/get-chevron
get chevron.tpt files from cartago

CaseHPC:~/GSoC2017/gentle.sh
bash script to run gentle alignment for acquiring speaker boundaries and extract speaker info to speaker.list, usage example:

./gentle.sh [filename] [txt_ext] [out_dir] [out_ext] [audio_dir] [txt_dir]

CaseHPC:~/Gentle/gentle/gen_spk.py
Aligning transcript with audio signal, and insert the speaker info when output the alignment result.

RedHenSpeakRecog

Wednesday, June 7, 2017

Week 1: Combining Speaker Information and Forced Alignment

No comments:

Post a Comment