We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,谢谢你这么棒的工作! 我有下面的这些问题想要问一下: 1.WavTokenizer可以提取帧级别的音素embedding吗? 2.WavTokenizer说到可以提取75tokens每秒,我在操作过程中用了part2部分的代码,用的是WavTokenizer-small-320-24k-4096这个模型,config设置的是wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml(这个是仓库里给的),我输入了两份语音,一份是12秒,在partB提取出的tensor的shape是(1,1,936);一份是8秒,在partB提取出的tensor的shape是(1,1,610)。这是为什么呢。 3.我如果对一个语音片段做了WavTokenizer,想提取整段的语义embedding,直接用partB的代码就可以了吗?config和pre-trained模型用wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml 和 WavTokenizer-small-320-24k-4096可以吗?
The text was updated successfully, but these errors were encountered:
想提取整段的语义embedding,直接用partB的代码就可以了吗?config和pre-trained模型用wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml 和 WavTokenizer-small-320-24k-4096可以吗?
非常感谢你的关注
Sorry, something went wrong.
好的,谢谢你,我再研究研究
No branches or pull requests
你好,谢谢你这么棒的工作!
我有下面的这些问题想要问一下:
1.WavTokenizer可以提取帧级别的音素embedding吗?
2.WavTokenizer说到可以提取75tokens每秒,我在操作过程中用了part2部分的代码,用的是WavTokenizer-small-320-24k-4096这个模型,config设置的是wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml(这个是仓库里给的),我输入了两份语音,一份是12秒,在partB提取出的tensor的shape是(1,1,936);一份是8秒,在partB提取出的tensor的shape是(1,1,610)。这是为什么呢。
3.我如果对一个语音片段做了WavTokenizer,想提取整段的语义embedding,直接用partB的代码就可以了吗?config和pre-trained模型用wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml 和 WavTokenizer-small-320-24k-4096可以吗?
The text was updated successfully, but these errors were encountered: