I had a use case to read data (few columns) from parquet file stored in S3, and write to DynamoDB table, every time a file was uploaded. Thinking to use AWS Lambda, I was looking at options of how to read parquet files within lambda until I stumbled upon AWS Data Wrangler. From the document … Continue reading Reading Parquet files with AWS Lambda
Pandas Scratchpad – I
This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List - DataFrame from List of List - DataFrame from List of Dict - DataFrame … Continue reading Pandas Scratchpad – I
Python – Counter – Compare Lists
A few days back I wrote about using sorted(list) to compare 2 list. Recently I learned we can also use Counter to compare list without taking their order into account. Happy Learning !!
Merge json files using Pandas
Quick demo for merging multiple json files using Pandas - import pandas as pd import glob import json file_list = glob.glob("*.json") >>> file_list ['b.json', 'c.json', 'a.json'] Use enumerate to assign counter to files. allFilesDict = {v:k for v, k in enumerate(file_list, 1)} >>> allFilesDict {1: 'b.json', 2: 'c.json', 3: 'a.json'} Append the data into list … Continue reading Merge json files using Pandas
Pandas – ValueError: If using all scalar values, you must pass an index
Reading json file using Pandas read_json can fail with "ValueError: If using all scalar values, you must pass an index". Let see with an example - cat a.json { "creator": "CaptainAmerica", "last_modifier": "NickFury", "title": "Captain America: The First Avenger", "view_count": 12000 } >>> import pandas as pd >>> import glob >>> for f in glob.glob('*.json'): … Continue reading Pandas – ValueError: If using all scalar values, you must pass an index
Python – sort() vs sorted(list)
You can compare list using sort() or sorted(list), but be careful with sort() - >>> c = [('d',4), ('c',3), ('a',1), ('b', 2)] >>> a = [('a',1), ('b', 2), ('c',3), ('d',4)] >>> a.sort() == c.sort() True >>> >>> a = [('a',1), ('b', 2), ('c',3), ('d',4)] >>> b = [('b',2), ('c', 3), ('a',1)] >>> >>> a.sort() == … Continue reading Python – sort() vs sorted(list)
Python – str.maketrans()
Working on a Python code, I had a requirement for removing the single/double quotes and open/close brackets from the string of below format -- >>> text = """with summary as (select ' ... 'p.col1,p.col2,p.col3, ROW_NUMBER() ' ... 'OVER(PARTITION BY p.col1,p.col3 ORDER BY ' ... 'p.col2) AS rk from (select * from (select ' ... 'col2, … Continue reading Python – str.maketrans()
namedtuple to JSON – Python
In pgdb - PostgreSQL DB API, the cursor which is used to manage the context of a fetch operation returns list of named tuples. These named tuples contain field names same as the column names of the database query. An example of a row from the list of named tuples - Row(log_time=datetime.datetime(2019, 3, 20, 5, … Continue reading namedtuple to JSON – Python
Python List
This blog post is about appending data elements to list in Python. Suppose we have a simple list "x", we will look at different ways to append elements to this list. x = [1, 2, 3] The "append" method appends only a single element >>> x [1, 2, 3] >>> x.append(4) >>> x [1, 2, … Continue reading Python List
Python – Flatten List of Lists
Itertools is one of the most powerful module in Python. Today I had requirement to flatten list of lists and itertools made it so easy. My list -- >> val = [['a','b'],'c',['d','e','f']] Required Result ['a', 'b', 'c', 'd', 'e', 'f'] How do you do it? Itertools to the resuce -- >>> from itertools import chain … Continue reading Python – Flatten List of Lists