AWS Glue Job Fails with CSV data source does not support map data type error

AWS Glue is a serverless ETL service to process large amount of datasets from various sources for analytics and data processing. Recently I came across "CSV data source does not support map data type" error for a newly created glue job. In a nutshell, the job was performing below steps: Read the data from S3 … Continue reading AWS Glue Job Fails with CSV data source does not support map data type error

Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List - DataFrame from List of List - DataFrame from List of Dict - DataFrame … Continue reading Pandas Scratchpad – I

Merge json files using Pandas

Quick demo for merging multiple json files using Pandas - import pandas as pd import glob import json file_list = glob.glob("*.json") >>> file_list ['b.json', 'c.json', 'a.json'] Use enumerate to assign counter to files. allFilesDict = {v:k for v, k in enumerate(file_list, 1)} >>> allFilesDict {1: 'b.json', 2: 'c.json', 3: 'a.json'} Append the data into list … Continue reading Merge json files using Pandas