October 19, 2013

Automatic speaker tracking in audio recordings



A new system dispenses with the human annotation of training data required by its predecessors but achieves comparable results.

A central topic in spoken-language-systems research is what’s called speaker diarization, or computationally determining how many speakers feature in a recording and which of them speaks when. Speaker diarization would be an essential function of any program that automatically annotated audio or video recordings.