Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utterance segmentation #1

Open
amirim opened this issue Apr 15, 2018 · 2 comments
Open

utterance segmentation #1

amirim opened this issue Apr 15, 2018 · 2 comments

Comments

@amirim
Copy link

amirim commented Apr 15, 2018

Many thanks for the contribution,
although the utterance segmentation is not a part of your work (the IEMOCAP emotion dataset is already segmented into utterances), do you have any idea about any tool that might be a good solution for this purpose?

@amirim amirim changed the title utterance extraction utterance segmentation Apr 15, 2018
@ksingla025
Copy link
Member

I suspect that you are trying to pass and new transcript that's why you need utterance segmentation. We you are using an ASR to transcribe then it can automatically give long pauses / speaker change as utterance boundaries.

@sambhavnoobcoder
Copy link

There are several tools and libraries that are commonly used in Automatic Speech Recognition (ASR) that can also aid in utterance segmentation:

Kaldi: Kaldi is a popular open-source toolkit for speech recognition that provides various tools and utilities for ASR-related tasks, including segmentation. It offers scripts and modules for speech data processing, feature extraction, and speech modeling, which can be used for utterance segmentation.

HTK (Hidden Markov Model Toolkit): HTK is another toolkit commonly used for building ASR systems. It offers functionalities for acoustic modeling, which can be adapted or used for segmentation tasks based on HMMs.

Praat: While Praat is primarily used in phonetics, it also offers capabilities for annotating and segmenting speech. It allows manual segmentation of speech signals and can be used to mark boundaries between different utterances.

LibROSA and librosa.segment functions: LibROSA is a Python library for audio and music analysis. While it's not solely an ASR tool, it provides functionalities for audio processing, feature extraction, and segmentation. The librosa.segment module offers functions that can be used for segmentation based on different criteria.

Google Speech-to-Text API: Google's Speech-to-Text API offers automatic transcription capabilities and may include segmentation functionalities to separate different utterances based on pauses or speaker turns.

CMU Sphinx: CMU Sphinx is an open-source speech recognition system that provides tools and libraries for speech recognition tasks. It might offer utilities for segmentation purposes within its suite of tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants