Good day
When using the integration: AWS S3 recording bulk actions
I am able to successfully export all the recordings and meta data that the client requires.
The first problem I encounter is after the Glue crawler finishes and creates the table, it is automatically partitioned and the partition schema does not match the table schema which causes a hive error to be generated when the table is queried: "HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced". I was able to resolve this by selecting the checkbox for the option in the Glue crawler under updates and scheduling > advanced options, called "Update all new and existing partitions with metadata from the table". After dropping the table and running the crawler again to rebuild the table this error is resolved. I highly recommend that this step is added to your Athena + Glue example document.
The next issue that I am having and cannot resolve, is that even though I have set the "ignore.malformed.json" key to true for the database, I am experiencing queries occasionally generate the following error: "HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Expected a ':' after a key at 4 [character 5 line 1]". Why would there be corrupted data exported from Genesys Cloud?
When running a query without a partition specified, the query will look at all the data, and a single corrupt JSON will stop the query from resolving successfully. The only way to get queries to resolve is to specify partitions to narrow down the result set and possibly skip over the corrupt JSON.
This leaves the client unable to use the export to find the recordings that they need and they are understandably very upset that we have advised them this is the best way to export and now they carry the costs for this unworkable solution with only a few days before their Genesys Cloud org is cancelled.
Any assistance with this would be greatly appreciated.