utterance segmentation #1

amirim · 2018-04-15T10:33:15Z

Many thanks for the contribution,
although the utterance segmentation is not a part of your work (the IEMOCAP emotion dataset is already segmented into utterances), do you have any idea about any tool that might be a good solution for this purpose?

ksingla025 · 2018-08-07T16:48:23Z

I suspect that you are trying to pass and new transcript that's why you need utterance segmentation. We you are using an ASR to transcribe then it can automatically give long pauses / speaker change as utterance boundaries.

sambhavnoobcoder · 2024-01-03T16:30:52Z

There are several tools and libraries that are commonly used in Automatic Speech Recognition (ASR) that can also aid in utterance segmentation:

Kaldi: Kaldi is a popular open-source toolkit for speech recognition that provides various tools and utilities for ASR-related tasks, including segmentation. It offers scripts and modules for speech data processing, feature extraction, and speech modeling, which can be used for utterance segmentation.

HTK (Hidden Markov Model Toolkit): HTK is another toolkit commonly used for building ASR systems. It offers functionalities for acoustic modeling, which can be adapted or used for segmentation tasks based on HMMs.

Praat: While Praat is primarily used in phonetics, it also offers capabilities for annotating and segmenting speech. It allows manual segmentation of speech signals and can be used to mark boundaries between different utterances.

LibROSA and librosa.segment functions: LibROSA is a Python library for audio and music analysis. While it's not solely an ASR tool, it provides functionalities for audio processing, feature extraction, and segmentation. The librosa.segment module offers functions that can be used for segmentation based on different criteria.

Google Speech-to-Text API: Google's Speech-to-Text API offers automatic transcription capabilities and may include segmentation functionalities to separate different utterances based on pauses or speaker turns.

CMU Sphinx: CMU Sphinx is an open-source speech recognition system that provides tools and libraries for speech recognition tasks. It might offer utilities for segmentation purposes within its suite of tools.

amirim changed the title ~~utterance extraction~~ utterance segmentation Apr 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utterance segmentation #1

utterance segmentation #1

amirim commented Apr 15, 2018

ksingla025 commented Aug 7, 2018

sambhavnoobcoder commented Jan 3, 2024

utterance segmentation #1

utterance segmentation #1

Comments

amirim commented Apr 15, 2018

ksingla025 commented Aug 7, 2018

sambhavnoobcoder commented Jan 3, 2024