What factors influence session start/end events?

LighthouseMike · September 27, 2023, 6:59pm

Hi again,
Sorry to bug yet again about my weird session handling problems... so, I've been doing a big rewrite in an attempt to fix a bug: apparently, people using my program are getting stuck inf "Not Responding" a lot. Well... the rewrite is nearly done, but now I'm dealing with another bug, equally nonsensical:

I call into my test queue, and Genesys send my software the call. My software receives it, I will handle it, test hold/mute/etc. and wrap it up correctly. That's all good. Everything works as expected.
I will call in again, and the call will never get routed to my software. Genesys thinks I failed to respond, and put me in Not Responding. If I set my routing status back to IDLE, I will get another call eventually, but this seems sporadic/random. I'll be doing something else (like writing a forum post ) forgetting I'm still on queue in my little test queue and a call will come in. My program responds correctly to those, so... what's the difference between that call and the one that never came to me?
Sometimes, for no apparent reason, my wrap-up form will appear because my program received a sessionEnded event from Genesys.

Now... I don't think this is the cause of my bug. I say this because in the live version (pre-rewrite) I can take more than one call. But I'm not here asking about the specific issue(s) I just described - that could be any number of things (including ID-10-T errors on my end ). I haven't been able to find any obvious mistakes, but I've ruled out:

Routing Status: I'm on IDLE when the second call comes
Presence status: I"m on "on queue" when the call comes in
Station: The user's "associated station" is set when the program starts, and I've confirmed it's set when the call comes in using the API
Code errors in session callbacks: No syntax errors, sometimes I'll just comment out everything and put an alert() there just cuz... I know it's nothing that silly.
Other clients open: I don't have Genesys Cloud open (desktop or web), and I only have one instance of my software running.

This is one of those infuriating bugs that seems to have no logical explanation. I basically rewrote my software trying to track down this not-responding bug, only to discover that the reason seems to be something far more mysterious and beyond the capabilities of my puny human brain to comprehend. I'm guessing - and this is a complete shot in the dark - but I'm guessing - there must be some other external factor that plays a role in deciding when to send my program which event. Why else would I get sessionEnded when I never got sessionStarted? Why else could I get Not Responding when I never got pendingSession? I kind of feel like something somewhere somehow else is sending my program these mixed signals, and no amount of rewriting, pouring over logs, adding more logs, logging my logger writing logs (this project is making quite the loggaholic outta me) will fix it.

How exactly does the SDK decide when to send me events? What settings, API endpoints, dark forces, etc. control that? How do I respond to sessionEnded when not on a call? So many questions, so little documentation...

anon78929116 · September 27, 2023, 7:38pm

Ok there's a lot to unpack here. I'm going to ask some follow up questions to I don't give you an essay that no one wants to read.

Are your stations configured to use persistent connection?
Are you edges genesys-hosted cloud edges, or are they on premises?
What version of the sdk are you using?

I'll take a second to elaborate on some of the internals of how our webrtc signaling works and hopefully that will help a bit with debugging. To set up a webrtc session, whether it's an outgoing or incoming call, the sdk will receive these messages via the xmpp web socket:

Sdk will receive a stanza/xmpp envelop known as a propose. It will receive one every second until the server receives a proceed. This is what triggers the sessionStarted event. It looks like this:

<message xmlns="jabber:client" from="..." id="..." to="...">
  <propose xmlns="..." id="1726399790" inin-cid="de6825a2-fbe6-4bf4-a222-bfb10f2f476e" inin-persistent-cid="..." inin-autoanswer="true" inin-sdp-over-xmpp="false">
    ...
  </propose>
</message>

For outbound calls, unless otherwise directed in the sdk config, the sdk will send the proceed automatically. Otherwise it will get sent when sdk.acceptPendingSession() is called. The proceed looks something like this:

<message xmlns="jabber:client" to="..."><proceed xmlns="...." id="1726399790"/></message>

The sdk will the receive a session-initiate which basically contains all the sdp needed to turn that stanza into a RTCPeerConnection object. Only the client that sent the propose will receive the session-initiate:

<iq xmlns="jabber:client" from="..." id="34848" to="..." type="set">
  <jingle xmlns="..." action="session-initiate" initiator="..." sid="1726399790" inin-cid="de6825a2-fbe6-4bf4-a222-bfb10f2f476e">
    ...
  </jingle>
</iq>

The sdk will then start media and send the session-accept:

<iq xmlns="jabber:client" id="cfc62bad-90ce-4a81-8b73-781088904a48" to="" type="set">
  <jingle xmlns="urn:xmpp:jingle:1" sid="1726399790" action="session-accept">
    ...
  </jingle>
</iq>

This applies to anytime a sdk starts a new webrtc session. If you have persistent connection configured for the station/phone, and you already have an connected, idle webrtc session (meaning there's not active call using it), then the sdk will emit these events based on conversation updates. This mostly applies to the sessionStarted and sessionEnded events.

Another thing to look out for, is this: Heavy throttling of chained JS timers beginning in Chrome 88 - Chrome for Developers. Hopefully this information helps with your debugging. If you are having issues with calls getting routed to you, that's out of my expertise and you might have to open up a support case to dig into that.

LighthouseMike · September 27, 2023, 7:54pm

Wow... this is hands down the coolest Genesys forum post I've ever seen. I don't understand a good chunk of it, but I'm loving this detailled explanation! Rather than "an essay no one wants to read", it's more like "a treasure-trove of previously unwritten Genesys knowledge." Thank you SO much for taking the time to do this.

So first off, I'm on version 8 of the SDK. Beyond that... I have absolutely no idea what you're talking about. Questions 1 and 2 look like something our admins might know (?), but the only "edge" i know is a Chromium-based browser nobody uses.

Anyway... I had another idea; what about using the Platform SDK's notifications API instead of WebRTC SDK events? Would that pick up something the other SDK might not? Beyond that, I'm going to have to pour over what you just sent me do some more research. Thanks again for all the info (and also for the quick response).

anon78929116 · September 27, 2023, 8:45pm

To elaborate on persistent connections. Phone can be configured to maintain and reuse webrtc connections. This is important to know because if your phone is using a persistent connection, the sdk will not receive webrtc signaling events and has to "fake" them based on conversation updates. This information is available by looking at the station object on the top level of sdk after initialization.

The reason why I asked about whether you have an on-prem edge is because the edge is what receives and sends phone calls with the carrier. It's also the other end of the webrtc sdk's peerConnection. When you receive/place a phone call, both sides are connected through the edge. I have seen instances where spotty internet or weird firewall rules with on-prem edges can affect the flow of traffic or even add several seconds of delays which is pretty bad when you only have a few seconds to answer a phone call. This is why I sent you web socket messages above so you can make sure you are receiving those in a predictable and timely manner.

As far as using the notifications API, I don't think that will do you any good. Firstly, it shouldn't pick up anything the sdk does not as it relates to calls and conversations. Secondly, even if it did, the webrtc sdk itself needs to know about them in order properly answer webrtc sessions.

Hope this helps.

LighthouseMike · September 28, 2023, 1:55pm

Okay, so I heard back on your other questions:

Yes, it looks like we do use "persistent connection". I asked one of our admins and he sent me this:

I also looked at the station object and it had webRtcPersistentEnabled: true, so the answer definitely seems to be "yes".

Our edge is on AWS, a service he called "PureCloud Voice". Apparently this means Genesys-hosted.

I kinda had to laugh when you used the word "predictable" - working with this SDK has been many things, but predictable is not one of them. So now knowing #1 above... why would we want that persistent connection thing to be turned on? Not receiving signaling events and having to jump through hoops to "fake it" sounds like exactly the kind of thing that has made this project so unpredictable. I've spent weeks researching, tinkering, guessing, and hoping against hope that something somehow would someday make sense. I know, that's debugging in general (), but Genesys-related debugging is normal debugging times a thousand. But if the answer has to do with some external config setting that can be turned off, no amount of debugging myself stupid will fix it. So... I guess two questions here:

What are the pros/cons of persistent connections? Why would we want them on vs. off?
More importantly... how the heck did you learn all this? You seem to have a wealth of knowledge beyond anything I could ever manage to glean from the documentation... is there a training course somewhere on here I can take to bring my level of understanding up to snuff? I'm so tired of being in over my head lol...

anon78929116 · September 28, 2023, 2:33pm

Having persistent connection enabled allows you to bypass the time it takes to establish a webrtc session. It eliminates a lot of round trips from sdk to server. There are cases where agents can be geolocated far from the edge and network roundtrips can take 200+ ms which adds up fast when you only have a few seconds to answer a call. If you have an established persistent connection and you have an inbound call that is "auto answerable" the edge just sends the audio down the established connection automatically without any input from the sdk. It's super fast. The downsides to persistent connection are relatively minor. First you're still processing/encoding media even over an idle connection. This takes cpu and a small bit of network bandwidth. Usually this is negligible and if it is resource constraint, it's likely brutal trying to establish that connection in the first place.

The other downside is a development one. It's obviously a pain to manage from a development standpoint. Unfortunately because of how things are set up currently, there's a tiny bit of guess work when it comes to persistent connection management. We have to make some assumptions that are almost always right. This gets harder when thinking of multiple clients for the same user across different systems. Throw in multiple simultaneous calls and you start regretting your life decisions. The hard part is currently, when a station is configured for persistent connection, only one webrtc session is persisted, all others are single use. So if you have a user who is on an active call, then starts or receives another call, one of those sessions will be torn down after the call ends.

Going back to the happy-path with persistent connection, if you have an idle persistent connection (meaning it's connected but there's no call active call on it currently), the sdk will create a fake pendingSession event based on a conversation update. It will also emit a fake "sessionEnded" event when the call ends, even though the webrtc session is still active. Again this is based on the conversation updates.

Sidenote: there are things in the works to make multiple call handling easier in the sdk, but frankly it's a pretty big lift behind the scenes and it's going to take a while to complete.

I know all this because I wrote most of it .

LighthouseMike · October 17, 2023, 1:49pm

Good morning

Since you sent this, I got another project in that needed to be done yesterday (you know how that is, I'm sure). I apologize for the delayed response.

And wow... I am impressed by the fact that you wrote a lot of the SDK; I was sure it had to be a big team or something. And I'm humbled that you would consider logging into the forum to answer noobish questions like this. Thank you VERY much for your time and attention.

But once this other project is finally taken care of, I will have to circle back to our Genesys-related project, and I still don't know what to do when Genesys sends my program unexpected events. So let me ask you this: how do YOU debug issues like this? I mean from where I'm standing, Genesys is... unpredictable. I don't say this as if to blame anyone - I suppose that's just the event-driven nature of WebRTC, WebSockets, JS in general. But when it comes to how to fix stuff, it's a little like the cartoons where Donald Duck or whoever is trying to patch a sinking ship by sticking his hands and feet in each new hole, until he finally gets slammed against the wall by a huge burst of water. I'm sure you guys have some kind of process in place, or some things you try when "stuff happens". Anything else you can offer that might help would be greatly appreciated.

anon78929116 · October 17, 2023, 2:29pm

One of these days, I should write a flow chart for debugging webrtc stuff, though it would be a big flow chart...

In a nutshell I just look for where the process broke down.

Did the client get a propose (pendingSession)?
if not, the call will never ring. Make sure you have an active websocket connection.
Did the client send a proceed (answerPendingSession)?
Is the call supposed to be auto answered? If not, then the the consuming app never "answered" the call. If it is supposed to be auto-answered (and this also applies to outbound calls) and the sdk did not auto-answer the call, is disableAutoAnswer enabled for the sdk config?
Did the client get a session-init?
if not, did multiple clients send a proceed, or are there "media helper" instances active?
Did the client send a session-accept?
if not, did was there an error that happened while trying to spin up media, or is media start really slow?
Did the call connect?
if not, this generally requires a deeper dive into network stuff.
Did the call terminate early?
Did the client end the call for some reason, or was it ended by the server? Was there a reason associated with the session-terminate stanza?

This flow is relevant to sending or receiving a call on a new session. If you have an established connection, it should look like this:

Did the client get a pendingSession?
if not, check the websocket connection. It should have received a conversation update which gets converted into a "pendingSession". Off the top of my head, I can't remember if we trigger this event for autoanswer calls.
Did the client send a PATCH rest request to the conversation to connect the call?
Client never answered the call. NOTE: if this was an autoanswer call, it would be connected by the backend automatically.

Errors after this point are not common as we should already have media and an established webrtc connection.

system · November 16, 2023, 2:29pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.