Error 104: Connection Reset

Adrian_Santamaria · January 26, 2023, 10:50am

Hello

We have a customer who has built an app that makes several requests to the Genesys Cloud API (we do not have access to their source code). That app is running 24/7 and makes requests at all hours.

The thing is that they report that sometimes it crashes, and they see this error: ConnectionResetError(104, 'Connection reset by peer') in their logs.

After doing some research, I have discovered that the Genesys API server IPs change over time. For example yesterday:

> nslookup api.mypurecloud.de
Servidor:  localhost
Address:  127.0.0.1

Respuesta no autoritativa:
Nombre:  api.mypurecloud.de
Addresses:  18.154.22.82
          18.154.22.22
          18.154.22.113
          18.154.22.81

And today:

> nslookup api.mypurecloud.de
Servidor:  localhost
Address:  127.0.0.1

Respuesta no autoritativa:
Nombre:  api.mypurecloud.de
Addresses:  18.154.48.18
          18.154.48.52
          18.154.48.88
          18.154.48.15

So, my theory is that sometimes the request simply happens to coincide with one of those changes, and therefore it gets that RST. And that in those cases they should just try the same request again. Am I right?

Thank you in advance!

Eos_Rios · January 26, 2023, 1:31pm

No idea on the actual cause of the issue, but as for retries, depending on the SDK they're using, assuming they're using one, there should be an option to configure automatic retries so they don't have to write their own.

Example for dotnet;
https://developer.genesys.cloud/devapps/sdk/dotnet

So they could already be telling it on failure wait x seconds then try again, up to y times.

tim.smith · January 26, 2023, 2:56pm

Not only do they change over time, but there are many instances running at once and you may be routed to any one of them at any time. You must always use the hostname to use DNS resolution; you must never make requests directly to an IP address.

There's no way I can tell if that's the cause. It seems suspect though as I've never heard of this happening to anyone else before. I would suggest launching an investigation with your networking team and opening a case with Genesys Cloud Care for further investigation.

Per the official documentation, only 429, 502, 503, and 504 responses are meant to be retryable in Genesys Cloud. Your case sounds like retrying is a good workaround, but what you're seeing isn't expected behavior AFAIK. https://developer.genesys.cloud/platform/api/rate-limits#retryable-requests

Jason_Mathison · February 7, 2023, 3:25am

I agree with Tim that this is likely due to the configuration of the customer network. Some sort of proxy/firewall is dropping the connection quicker than the service and Genesys Cloud are expecting. There are a couple of headers they could use to try to resolve this.

The first option is to try to tune their way out of this by using these two headers:

Connection: keep-alive
Keep-Alive: timeout=5, max=1000

They can tweak the Keep-Alive parameters to see if they can find stability:

The second option is to to set this header

Connection: close

This will cause the connection to be closed at the end of every request, so there is just about no way to get a connection reset error. The downside of this is that they have to establish a new TCP connection for every request, which will add latency.

--Jason

Adrian_Santamaria · February 7, 2023, 7:49am

Hello Jason. Thank you for the tips. In the end, it looks like it was a misconfigured firewall. I´ll consider what you suggest with the headers in case something similar happened in the future, though.

system · March 10, 2023, 7:49am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.