Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands.

pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

1. Few quick ways to create Pandas DataFrame

DataFrame from Dict of List –

df_from_dict_of_list

DataFrame from List of List –

df_from_list_of_list

DataFrame from List of Dict –

df_from_list_of_dict

DataFrame using zip function –

df_using_zip

data = {'Name':['Iron Man', 'Deadpool', 'Captian America', Thor', 'Hulk', 'Spider Man'], 'Age':[48, 30, 100, 150, 50, 22]}

data = [['Iron Man', 48], ['Deadpool', 30], ['Captian America', 100], ['Thor', 150], ['Hulk', 50], ['Spider Man', 22]]

data = [{'Name':'Iron Man', 'Age': 48}, {'Name':'Deadpool', 'Age': 30}, {'Name':'Captian America', 'Age': 100},
{'Name':'Thor', 'Age': 150}, {'Name':'Hulk', 'Age': 50}, {'Name':'Spider Man', 'Age': 22}]

Name = [''Iron Man', 'Deadpool', 'Captian America', 'Thor', 'Hulk', 'Spider Man']
Age = [48, 30, 100, 150, 50, 22]
data = list(zip(Name, Age))

df = pd.DataFrame(data, columns = ['Name', 'Age'])

2. Reading Data from CSV

df_read_csv.png

While reading csv using pandas you might hit error like

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

In such cases you need to use encoding parameter. For example –

df = pd.read_csv('avengers.csv', encoding = "ISO-8859-1")

3. Converting CSV to JSON

Default way

read_csv_to_json.png

cat /tmp/avengers.json
{"Name":{"0":"Iron Man","1":"Deadpool","2":"Captian America","3":"Thor","4":"Hulk","5":"Spider Man","6":"Batman"},"Age":{"0":48.0,"1":30.0,"2":100.0,"3":150.0,"4":50.0,"5":22.0,"6":null},"Avenger":{"0":"Y","1":"Y","2":"Y","3":"Y","4":"Y","5":"Y","6":"N"}}

With “orient” parameter

to_json_orient_records.png

cat /tmp/avengers_orient_record.json
[{"Name":"Iron Man","Age":48.0,"Avenger":"Y"},{"Name":"Deadpool","Age":30.0,"Avenger":"Y"},{"Name":"Captian America","Age":100.0,"Avenger":"Y"},{"Name":"Thor","Age":150.0,"Avenger":"Y"},{"Name":"Hulk","Age":50.0,"Avenger":"Y"},{"Name":"Spider Man","Age":22.0,"Avenger":"Y"},{"Name":"Batman","Age":null,"Avenger":"N"}]

The way data is stored in json file is visually different. With orient=records, the record is list of dictionary.

4. Reading Json

read_json.png

5. Change DataType

Change_dtype

6. Describe DataFrame

By default, describing a Dataframe returns only numeric fields.

describe_default

To describe all the columns –

describe_all

To describe  columns with category datatype –

df_category

7. Count distinct observations

Screen Shot 2019-10-06 at 11.18.15 PM

8. Count of Unique values

Screen Shot 2019-10-06 at 11.16.55 PM

9.  Null values

Total null values for a column

Screen Shot 2019-10-06 at 11.34.56 PM

10. Pandas Profiling

To generate profile report of DataFrame use pandas-profiling. The profile contains Overview, Variable details, Pearson and Spearman Correlations helping in quick analysis of data.

Screen Shot 2019-10-06 at 11.29.14 PM

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s