Add new partitions in AWS Glue Data Catalog from AWS Glue Job

Given that you have a partitioned table in AWS Glue Data Catalog, there are few ways in which you can update the Glue Data Catalog with the newly created partitions. Run MSCK REPAIR TABLE <database>.<table_name> in AWS Athena service.Rerun the AWS Glue crawler . Recently, AWS Glue service team has added a new feature (or … Continue reading Add new partitions in AWS Glue Data Catalog from AWS Glue Job

Using AWS Data Wrangler with AWS Glue Job 2.0 and Amazon Redshift connection

I will admit, AWS Data Wrangler has become my go to package for developing extract, transform, and load (ETL) data pipelines and other day-to-day scripts. AWS Data Wrangler integration with multiple big data AWS services like S3, Glue Catalog, Athena, Databases, EMR, and others makes life simple for engineers. It also provides the ability to … Continue reading Using AWS Data Wrangler with AWS Glue Job 2.0 and Amazon Redshift connection

Implementing Glue ETL job with Job Bookmarks

AWS Glue is a fully managed ETL service to load large amounts of datasets from various sources for analytics and data processing with Apache Spark ETL jobs. In this post I will discuss the use of AWS Glue Job Bookmarks feature in the following architecture. AWS Glue Job Bookmarks help Glue maintain state information of … Continue reading Implementing Glue ETL job with Job Bookmarks