-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the output length? #113
Comments
Hi! Can you give me the traceback when |
user@linux:~/tmp/bark.cpp$ ./build/bin/main -t 2 -o "./output2.wav" -p "Matcha is finely ground powder of specially grown and processed green t
ea leaves traditionally consumed in East Asia, which is mostly produced in Japan today. The green tea plants used for matcha are shade-grown for thr
ee to four weeks before harvest; the stems and veins are removed during processing. During shaded growth, the plant Camellia sinensis produces more
theanine and caffeine. The powdered form of matcha is consumed differently from tea leaves or tea bags, as it is suspended in a liquid, typically wa
ter or milk."
bark_load_model_from_file: loading model from './ggml_weights'
bark_load_model_from_file: reading bark text model
gpt_model_load: n_in_vocab = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ftype = 0
gpt_model_load: qntvr = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1701.69 MB
bark_load_model_from_file: reading bark vocab
bark_load_model_from_file: reading bark coarse model
gpt_model_load: n_in_vocab = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ftype = 0
gpt_model_load: qntvr = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1250.69 MB
bark_load_model_from_file: reading bark fine model
gpt_model_load: n_in_vocab = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 7
gpt_model_load: n_wtes = 8
gpt_model_load: ftype = 0
gpt_model_load: qntvr = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1218.26 MB
bark_load_model_from_file: reading bark codec model
encodec_model_load: model size = 44.32 MB
bark_load_model_from_file: total model size = 4170.64 MB
bark_tokenize_input: prompt: 'Matcha is finely ground powder of specially grown and processed green tea leaves traditionally consumed in East Asia,
which is mostly produced in Japan today. The green tea plants used for matcha are shade-grown for three to four weeks before harvest; the stems and
veins are removed during processing. During shaded growth, the plant Camellia sinensis produces more theanine and caffeine. The powdered form of mat
cha is consumed differently from tea leaves or tea bags, as it is suspended in a liquid, typically water or milk.'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 36199 20161 20172 23483 20502 26960 20562 72276
bark_forward_text_encoder: .........................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
.................................................................................................................................
bark_print_statistics: mem per token = 4.80 MB
bark_print_statistics: sample time = 172.53 ms / 696 tokens
bark_print_statistics: predict time = 67473.14 ms / 96.81 ms per token
bark_print_statistics: total time = 67666.94 ms
bark_forward_coarse_encoder: .......................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
....................................................................................................................................................
............................................
bark_print_statistics: mem per token = 134.99 MB
bark_print_statistics: sample time = 41.46 ms / 2088 tokens
bark_print_statistics: predict time = 974489.06 ms / 466.48 ms per token
bark_print_statistics: total time = 974548.31 ms
bark_forward_fine_encoder: ...........double free or corruption (!prev)
Aborted
|
Hmm sorry, at the moment I'm unable to test this. It always says:
and Guess I somehow broke my development environment and need to fix that first. Feel free to close this issue if appropriate. |
The model files changed and got merged into one. You can grab uptodate files from here https://huggingface.co/Green-Sky/bark-ggml/tree/main |
Hmm. I strictly followed the "Prepare data & Run" process from the README.md maybe I can debug that later... I downloaded the updated model files. I need to specify the exact file with both But, I'm happy to confirm: Now it works! It's excruciatingly slow on my machine, took 10 minutes to convert that text into a 12s audio file. And the fist half of the text is missing. It starts pretty much in the middle. Edit: And this one fails:
Edit2: Longer prompt also fails with q8 and shorter prompt also fails with f16, works with q8 though. |
@h3ndrik Thanks for the feedback!
i'd be super curious if you are able to generate this sentence faster |
I think I remember reading bark generates 30s of audio at a time. Is that also true for bark.cpp?
I've tried letting it read some article and it crashed. Is that a length limitation or something else?
Also: Is there example code to make it read back a whole news article, a dialogue or anything useful?
The text was updated successfully, but these errors were encountered: