Hi there,
I've been doing research into a design to integrate and import all of our Genesys data into our data warehouse, and the ideal path seems to be utilizing the Conversations Jobs endpoint and lifecycle. However, in looking at the Data Integration guide, there seems to be some ambiguity around what the meaning of the below statement means:
2. This can cause confusions with customers because the cutoff time for the job can cause some calls not to show up for at least 48 hours.
Do you all have further insight into what the above means? Does that mean that the report job can take up to 48 hours to complete? Another interpretation could be that some calls may not update for the jobs endpoint up to 48 hours after they occur? Here's the link https://developer.mypurecloud.com/api/rest/v2/analytics/data_integration_guide.html. The option I'm thinking through is at the bottom of the page.
We were forwarded here for assistance by our Sr. Customer Success Manager, Chris Smith. Please let me know if there's another forum I should be posting to instead. Thanks!
I wrote the developer integration guide so I will attempt to clarify what I mean here. The data that is queried by the jobs endpoint relies on a data collection process that runs once per day. It is not set on a guaranteed schedule (same time to the minute every day), but instead runs using AWS spot fleet availability during a certain time range in the evening.
For the most part, you will get all of your calls that run with a 24 hour period. A single conversation can have multiple call segments and multiple participants. What customers often run into and causes confusion is that if a conversation is in progress (with all the different segment) and the job runs, that conversation data and the calls that came after the cutoff on the job), will not show up until the next time the job runs (hence 48 hours later).
This is a point of confusion for customers, particularly when using the jobs endpoint to populate their data warehouses is because if you run the data and reconcile it, you might find a few conversations that might not show up because of the time of when the aggregation job runs all of the participants in the conversation have not been collected. This is confusing because then they assume data is missing or lost.
I hope this answers your question.
Thanks,
John Carnell
Manager, Developer Engagement
Hi John,
This is great context. Thank you. To avoid running into the job on the Genesys datalake end, would it be best to push out any downstream ingestions integrations to 3 days out? Theoretically, this way the data for the previous day plus one has been ingested? That should stabilize conversations that took place across the day threshold as well, right?
It really comes down to your use case and business requirements. Waiting 3 days would get all of the previous 2 days, but I am going to forward your post to our team lead on the analytics team and get his input two. The analytics APIs are constantly being tuned so he might have additional information beyond what I have provided.
I'm working with Rainu on this project, and I'm hoping you can tell me if what we're doing will work. Each day we would like to pull conversation data into our data warehouse. We would like to use the conversation details job endpoint since it is the recommended approach when pulling large sets of data. We will run our integration once per day, but the time is not constant. To avoid needing to update records that were already pulled in, we will only be pulling in conversations that have ended - we're assuming we will not be pulling in survey or evaluation data at this time.
With this additional information, can you tell us how many days delay will make sure we always get the data we need? We are trying to understand if we need to filter for conversations that ended the previous day or something else.
As an example, if our integration were to run today, are we safe to filter for conversations that ended at or before 1/12/2021 11:59:59 PM Pacific Time or should that be 1/11/2021 11:59:59 PM Pacific Time?
Blockquote
The data that is queried by the jobs endpoint relies on a data collection process that runs once per day. It is not set on a guaranteed schedule (same time to the minute every day), but instead runs using AWS spot fleet availability during a certain time range in the evening. For the most part, you will get all of your calls that run with a 24 hour period. A single conversation can have multiple call segments and multiple participants. What customers often run into and causes confusion is that if a conversation is in progress (with all the different segment) and the job runs, that conversation data and the calls that came after the cutoff on the job), will not show up until the next time the job runs (hence 48 hours later).
If you want not to have to update or check for missing data, you need to look at a 48-hour window. If you feel this is something that you would like improved I would recommend you open a ticket on our Aha ideas portal.
I would recommend that you filter any conversation that have not ended prior to the time period you are running your job.
I am going to ping our analytics team and see if they have any further comments.
Thanks,
John Carnell
Manager, Developer Engagement
Analytics dev here. The 48-hour window is a general guideline for how far back you can expect data to be available, but is not a strict limit on availability. The reason for this guidance is that the daily data collection process John mention has the potential to fail if there are issues with underlying AWS infrastructure, and in those cases we are generally able to re-run that process and have it update within the next day. So you can generally expect the availability delay to range anywhere from 12 to 48 hours.
However, you can programmatically determine exactly how recent the data from the Conversations Jobs endpoint is, rather than just assuming some particular delay like 48 hours. The details job endpoints return a dataAvailabilityDate field along with your query result. This tells you that all data up to that timestamp is currently available from the jobs endpoint. There is also a separate endpoint (/api/v2/analytics/conversations/details/jobs/availability) you can use to retrieve that availability timestamp without creating a details job.
So you will generally be safe assuming that all conversations ending more than 48 hours ago will be returned, but dataAvailabilityDate gives you a stronger and more specific guarantee. I would consider fetching that timestamp and using it to choose your query interval or for constructing the time-based filter Jayme mentioned.