Custom Speech to Text API

Sagar_Tyagi · April 26, 2023, 8:07am

Hey Team,
We're trying to get user's audio in realtime and send it to our own service for speech to text rather than relying on genesys integrations. Is there a way to do so?
Sample flow would look like this :

User Calls.
We ask for what the user wants to purchase?
User says their list of items, quantity etc.
We stream the audio of what user said in 3, to our service.
We get a response back from our service as "POSSIBLE/NOT_POSSIBLE".
We use the response to control the architect flow.

We tried using GDF integration for this, but seems like the request sent to GDF already has transcripted text rather than the audio. Is there a way to get realtime audio in GDF?

Something similar to this thread:

Jerome.Saint-Marc · April 26, 2023, 4:43pm

Hello,

No, there is no ability to request to record user's audio in an Architect flow and be able to retrieve it and send it to a 3rd party for analysis.

Regards,

Sagar_Tyagi · April 27, 2023, 5:32am

Hey, we saw this thing called AudioHook, can we use this to achieve the realtime streaming and transcription? We don't necessarily need to record the user's audio.
And we don't need the audio for analysis, we actually want to run our own speech to text on the audio, and return a response accordingly.

Jerome.Saint-Marc · April 27, 2023, 3:22pm

I must say I don't know. I haven't tried audiohook recently.
In the past, the audio streaming would only start when the call reached the ACD Queue (so too late for you if you are trying to implement a voice bot, using your own speech to text service, and triggered from an Architect flow to gather customer's input (i.e. used in self-service, like an IVR)).
This may have changed. I don't know if the Architect Transcription action works as a start/stop for audio streaming. Or if it sets a "flag" to tell the system to start the transcription or not when the call reaches the ACD Queue. I unfortunately don't have a sample code of audiohook server I can run to try.

Then, you would need to find a way to make it work from an Architect flow - it depends if Transcription action can work as a start/stop - and I assume implement a WebServices so that the result can be queried from the Architect flow.
So it seems a long shot - given the other pre-requisites: enable recording with dual channel on the trunk, enable voice transcription, ...

Regards,

system · May 28, 2023, 3:23pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.