Reading Parquet files with AWS Lambda

I had a use case to read data (few columns) from parquet file stored in S3, and write to DynamoDB table, every time a file was uploaded. Thinking to use AWS Lambda, I was looking at options of how to read parquet files within lambda until I stumbled upon AWS Data Wrangler. From the document … Continue reading Reading Parquet files with AWS Lambda

Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List - DataFrame from List of List - DataFrame from List of Dict - DataFrame … Continue reading Pandas Scratchpad – I

Merge json files using Pandas

Quick demo for merging multiple json files using Pandas - import pandas as pd import glob import json file_list = glob.glob("*.json") >>> file_list ['b.json', 'c.json', 'a.json'] Use enumerate to assign counter to files. allFilesDict = {v:k for v, k in enumerate(file_list, 1)} >>> allFilesDict {1: 'b.json', 2: 'c.json', 3: 'a.json'} Append the data into list … Continue reading Merge json files using Pandas

Pandas – ValueError: If using all scalar values, you must pass an index

Reading json file using Pandas read_json can fail with "ValueError: If using all scalar values, you must pass an index". Let see with an example - cat a.json { "creator": "CaptainAmerica", "last_modifier": "NickFury", "title": "Captain America: The First Avenger", "view_count": 12000 } >>> import pandas as pd >>> import glob >>> for f in glob.glob('*.json'): … Continue reading Pandas – ValueError: If using all scalar values, you must pass an index