Reading Parquet files with AWS Lambda

I had a use case to read data (few columns) from parquet file stored in S3, and write to DynamoDB table, every time a file was uploaded. Thinking to use AWS Lambda, I was looking at options of how to read parquet files within lambda until I stumbled upon AWS Data Wrangler. From the document … Continue reading Reading Parquet files with AWS Lambda

Advertisement

Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List - DataFrame from List of List - DataFrame from List of Dict - DataFrame … Continue reading Pandas Scratchpad – I

Merge json files using Pandas

Quick demo for merging multiple json files using Pandas - import pandas as pd import glob import json file_list = glob.glob("*.json") >>> file_list ['b.json', 'c.json', 'a.json'] Use enumerate to assign counter to files. allFilesDict = {v:k for v, k in enumerate(file_list, 1)} >>> allFilesDict {1: 'b.json', 2: 'c.json', 3: 'a.json'} Append the data into list … Continue reading Merge json files using Pandas

Pandas – ValueError: If using all scalar values, you must pass an index

Reading json file using Pandas read_json can fail with "ValueError: If using all scalar values, you must pass an index". Let see with an example - cat a.json { "creator": "CaptainAmerica", "last_modifier": "NickFury", "title": "Captain America: The First Avenger", "view_count": 12000 } >>> import pandas as pd >>> import glob >>> for f in glob.glob('*.json'): … Continue reading Pandas – ValueError: If using all scalar values, you must pass an index

Python – sort() vs sorted(list)

You can compare list using sort() or sorted(list), but be careful with sort() - >>> c = [('d',4), ('c',3), ('a',1), ('b', 2)] >>> a = [('a',1), ('b', 2), ('c',3), ('d',4)] >>> a.sort() == c.sort() True >>> >>> a = [('a',1), ('b', 2), ('c',3), ('d',4)] >>> b = [('b',2), ('c', 3), ('a',1)] >>> >>> a.sort() == … Continue reading Python – sort() vs sorted(list)

Python – str.maketrans()

Working on a Python code, I had a requirement for removing the single/double quotes and open/close brackets from the string of below format -- >>> text = """with summary as (select ' ... 'p.col1,p.col2,p.col3, ROW_NUMBER() ' ... 'OVER(PARTITION BY p.col1,p.col3 ORDER BY ' ... 'p.col2) AS rk from (select * from (select ' ... 'col2, … Continue reading Python – str.maketrans()

namedtuple to JSON – Python

In pgdb - PostgreSQL DB API, the cursor which is used to manage the context of a fetch operation returns list of named tuples. These named tuples contain field names same as the column names of the database query. An example of a row from the list of named tuples - Row(log_time=datetime.datetime(2019, 3, 20, 5, … Continue reading namedtuple to JSON – Python

Python – Flatten List of Lists

Itertools is one of the most powerful module in Python. Today I had requirement to flatten list of lists and itertools made it so easy. My list -- >> val = [['a','b'],'c',['d','e','f']] Required Result ['a', 'b', 'c', 'd', 'e', 'f'] How do you do it? Itertools to the resuce -- >>> from itertools import chain … Continue reading Python – Flatten List of Lists