Suggestions on grabbing conversation details spanning more than 7 days

trevorburke · August 27, 2019, 6:52pm

To analyze agent activity, performance, etc. my team relies heavily on the PureCloud Python SDK to grab data, which is loaded regularly into our warehouse for visualization and analysis. We are having issues with conversation data in that some of our agents' conversations last upwards of 10 days. Currently, we pass a 2-day time range, dump the data to S3, and then UPSERT the last two day's worth of files to our warehouse.

The post_analytics_conversations_details_query method within the Analytics API permits a 7-day time range interval, but that would result in massive files, which would potentially cause memory issues and/or take very long to unnest the JSON and load to our warehouse.

I'm curious if there are any suggestions on how to access conversation data, so we can ensure our warehouse data matches the UI.

tim.smith · August 28, 2019, 2:03pm

PureCloud Analytics is a Data as a Service API. It is intended that applications will load data via the API in real time as it's needed. When you warehouse data and make a copy of live data, you will run into data integrity problems at some point and you must repeatedly check old data to see if it's changed to keep your data in sync. This is an unfortunate side effect of attempting to synchronize a warehouse with a DaaS API.

You must page through all data and compare/upsert in your warehouse to identify changes; there's no other way for an application to synchronize its data. Keep in mind that you shouldn't be paging deeper than around 10 pages for a given query as query performance may begin to degrade beyond that point and requests will eventually begin to time out. Use shorter intervals and filters to keep the result set to a reasonable size.

There are some analytics bulk APIs that are about to be released. Keep an eye out for those in the next week or two; they may be useful for you in this situation.

trevorburke · August 28, 2019, 4:54pm

Thanks for the response, Tim. Much appreciated.

Presently, we pass a 3 day date range of [3 days ago, today], and it sounds like expanding to the max may cause us more harm than benefit. I suppose we could modify our pipeline to iterate through a list of date range intervals to limit the number of response pages. And then subsequently upsert/load those date range of data to our warehouse.

system · September 28, 2019, 4:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.