-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: s2s client #34
base: main
Are you sure you want to change the base?
feat: s2s client #34
Conversation
@PeganovAnton could you please review this? need this to merge ASAP to enable QA to test S2S. |
riva/client/nmt.py
Outdated
Generates speech recognition responses for fragments of speech audio in :param:`audio_chunks`. | ||
The purpose of the method is to perform speech recognition "online" - as soon as | ||
audio is acquired on small chunks of audio. | ||
|
||
All available audio chunks will be sent to a server on first ``next()`` call. | ||
|
||
Args: | ||
audio_chunks (:obj:`Iterable[bytes]`): an iterable object which contains raw audio fragments | ||
of speech. For example, such raw audio can be obtained with | ||
|
||
.. code-block:: python | ||
|
||
import wave | ||
with wave.open(file_name, 'rb') as wav_f: | ||
raw_audio = wav_f.readframes(n_frames) | ||
|
||
streaming_config (:obj:`riva.client.proto.riva_asr_pb2.StreamingRecognitionConfig`): a config for streaming. | ||
You may find description of config fields in message ``StreamingRecognitionConfig`` in | ||
`common repo | ||
<https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_. | ||
An example of creation of streaming config: | ||
|
||
.. code-style:: python | ||
|
||
from riva.client import RecognitionConfig, StreamingRecognitionConfig | ||
config = RecognitionConfig(enable_automatic_punctuation=True) | ||
streaming_config = StreamingRecognitionConfig(config, interim_results=True) | ||
|
||
Yields: | ||
:obj:`riva.client.proto.riva_asr_pb2.StreamingRecognizeResponse`: responses for audio chunks in | ||
:param:`audio_chunks`. You may find description of response fields in declaration of | ||
``StreamingRecognizeResponse`` | ||
message `here | ||
<https://docs.nvidia.com/deeplearning/riva/user-guide/docs/reference/protos/protos.html#riva-proto-riva-asr-proto>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring needs to updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in #43
nchannels = 1 | ||
if args.list_input_devices: | ||
riva.client.audio_io.list_input_devices() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return | |
return | |
if args.list_output_devices: | |
riva.client.audio_io.list_output_devices() | |
return |
sound_stream = riva.client.audio_io.SoundCallBack( | ||
args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100 | ||
) | ||
print(sound_stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this print?
if args.output_device is not None or args.play_audio: | ||
print("playing audio") | ||
sound_stream = riva.client.audio_io.SoundCallBack( | ||
args.output_device, nchannels=nchannels, sampwidth=sampwidth, framerate=44100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make framerate
a parameter of the script, like --sample-rate-hz
in the script tts/talk.py
?
sampwidth = 2 | ||
nchannels = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampwidth
and nchannels
are set in 2 places: here and in play_responses()
function. Could you make global variables?
"then the default output audio device will be used.", | ||
) | ||
|
||
parser = add_asr_config_argparse_parameters(parser, profanity_filter=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably need to set max_alternatives=False
and word_time_offsets=False
because these parameters are pointless for the script. Do you think we also need to add speaker_diarization=False
flag?
parser.add_argument("--output-device", type=int, help="Output device to use.") | ||
parser.add_argument("--target-language-code", default="en-US", help="Language code of the output language.") | ||
parser.add_argument( | ||
"--play-audio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --play-audio
is not set, then the script doesn't give any output. We probably should add --output
parameter as in tts/talk.py
so that the script could produce some output on server.
play_responses(responses=nmt_service.streaming_s2s_response_generator( | ||
audio_chunks=audio_chunk_iterator, | ||
streaming_config=s2s_config), sound_stream=sound_stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
play_responses(responses=nmt_service.streaming_s2s_response_generator( | |
audio_chunks=audio_chunk_iterator, | |
streaming_config=s2s_config), sound_stream=sound_stream) | |
play_responses( | |
responses=nmt_service.streaming_s2s_response_generator( | |
audio_chunks=audio_chunk_iterator, | |
streaming_config=s2s_config, | |
), | |
sound_stream=sound_stream | |
) |
interim_results=True, | ||
), | ||
translation_config = riva.client.TranslationConfig( | ||
target_language_code=args.target_language_code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be source_language_code
and, probably, model_name
as in config.
first = True # first tts output chunk received | ||
auth = riva.client.Auth(args.ssl_cert, args.use_ssl, args.server) | ||
nmt_service = riva.client.NeuralMachineTranslationClient(auth) | ||
s2s_config = riva.client.StreamingTranslateSpeechToSpeechConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a tts_config
as in proto? If so, then we could add a add_tts_config_argparse_parameters()
function to argparse_utils.py
function and refactor tts/talk.py
using this function.
ba394ef
to
d2213b6
Compare
d2213b6
to
db64efc
Compare
db64efc
to
b665b2f
Compare
Adding speech to speech basic cli