Implementing Glue ETL job with Job Bookmarks

AWS Glue is a fully managed ETL service to load large amounts of datasets from various sources for analytics and data processing with Apache Spark ETL jobs. In this post I will discuss the use of AWS Glue Job Bookmarks feature in the following architecture. AWS Glue Job Bookmarks help Glue maintain state information of … Continue reading Implementing Glue ETL job with Job Bookmarks

AWS Glue – Querying Nested JSON with Relationalize Transform

AWS Glue has transform Relationalize that can convert nested JSON into columns that you can then write to S3 or import into relational databases. As an example - Initial Schema: >>> df.printSchema() root |-- Id: string (nullable = true) |-- LastUpdated: long (nullable = true) |-- LastUpdatedBy: string (nullable = true) |-- Properties: struct (nullable … Continue reading AWS Glue – Querying Nested JSON with Relationalize Transform