Rename Glue Tables using AWS Data Wrangler

I had a use case of renaming over 50 tables, adding “prod_” prefix to the existing Glue tables.

AWS Athena does not support native Hive DDL “ALTER TABLE table_name RENAME TO” command. So one of the option was to –

  1. “Generate Create Table DDL” in AWS Athena.
  2. Modify the table name.
  3. Execute the DDL.
  4. Preview the new table.
  5. Drop the old table

Following the above steps for over 50 tables would have been cumbersome. This is where AWS Data Wrangler comes to rescue 🙂

Lets take a look on how I did it. All the commands were executed from AWS Sagemaker Notebook –

import awswrangler as wr

#get all the tables for database from glue catalog
tables = wr.catalog.get_tables(database='mydb')

#create list of tables
table_details = []
for table in tables:
    table_details.append(table)

#create the new table with "prod_" prefix and drop original table
for table_detail in table_details:
    db = table_detail['DatabaseName']
    orig_tab_name = table_detail['Name']
    new_table_name = 'prod_'+ orig_tab_name
    table_loc = table_detail['StorageDescriptor']['Location']
    table_columns = wr.catalog.get_table_types(database=db, table=orig_tab_name)
    parameters = table_detail['StorageDescriptor']['Parameters']
    print('\n')
    print(f'Create table: {new_table_name}, location: {table_loc}')
    wr.catalog.create_parquet_table(
        database=db,
        table=new_table_name,
        path=table_loc,
        columns_types=table_columns,
        parameters=parameters
    )
    status = wr.catalog.delete_table_if_exists(database=db, table=orig_tab_name) 
    print(f'Drop table: {orig_tab_name} from database: {db}, status: {status}')

Quick walkthrough of the above code –

  1. Import AWS Data Wrangler.
  2. Get all the table details using wr.catalog.get_tables
  3. Add the tables to the list.
  4. For each table in the table list –
    • Get the database name.
    • Get the original table name.
    • Get the table location.
    • Get the table columns as dictionary using wr.catalog.get_table_types.
    • Get the table properties.
    • Create the table using wr.catalog.create_parquet_table with “prod_” as prefix.
    • Drop the original table.

One thing to note is the original table has to be Parquet table in AWS Glue catalog.

Reference –

https://aws-data-wrangler.readthedocs.io/en/stable/stubs/awswrangler.catalog.create_parquet_table.html#awswrangler.catalog.create_parquet_table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s