This blog is scratchpad for day-to-day Pandas commands.
pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
1. Few quick ways to create Pandas DataFrame
DataFrame from Dict of List –
DataFrame from List of List –
DataFrame from List of Dict –
DataFrame using zip function –
data = {'Name':['Iron Man', 'Deadpool', 'Captian America', Thor', 'Hulk', 'Spider Man'], 'Age':[48, 30, 100, 150, 50, 22]} data = [['Iron Man', 48], ['Deadpool', 30], ['Captian America', 100], ['Thor', 150], ['Hulk', 50], ['Spider Man', 22]] data = [{'Name':'Iron Man', 'Age': 48}, {'Name':'Deadpool', 'Age': 30}, {'Name':'Captian America', 'Age': 100}, {'Name':'Thor', 'Age': 150}, {'Name':'Hulk', 'Age': 50}, {'Name':'Spider Man', 'Age': 22}] Name = [''Iron Man', 'Deadpool', 'Captian America', 'Thor', 'Hulk', 'Spider Man'] Age = [48, 30, 100, 150, 50, 22] data = list(zip(Name, Age)) df = pd.DataFrame(data, columns = ['Name', 'Age'])
2. Reading Data from CSV
While reading csv using pandas you might hit error like
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
In such cases you need to use encoding parameter. For example –
df = pd.read_csv('avengers.csv', encoding = "ISO-8859-1")
3. Converting CSV to JSON
Default way
cat /tmp/avengers.json {"Name":{"0":"Iron Man","1":"Deadpool","2":"Captian America","3":"Thor","4":"Hulk","5":"Spider Man","6":"Batman"},"Age":{"0":48.0,"1":30.0,"2":100.0,"3":150.0,"4":50.0,"5":22.0,"6":null},"Avenger":{"0":"Y","1":"Y","2":"Y","3":"Y","4":"Y","5":"Y","6":"N"}}
With “orient” parameter
cat /tmp/avengers_orient_record.json [{"Name":"Iron Man","Age":48.0,"Avenger":"Y"},{"Name":"Deadpool","Age":30.0,"Avenger":"Y"},{"Name":"Captian America","Age":100.0,"Avenger":"Y"},{"Name":"Thor","Age":150.0,"Avenger":"Y"},{"Name":"Hulk","Age":50.0,"Avenger":"Y"},{"Name":"Spider Man","Age":22.0,"Avenger":"Y"},{"Name":"Batman","Age":null,"Avenger":"N"}]
The way data is stored in json file is visually different. With orient=records, the record is list of dictionary.
4. Reading Json
5. Change DataType
6. Describe DataFrame
By default, describing a Dataframe returns only numeric fields.
To describe all the columns –
To describe columns with category datatype –
7. Count distinct observations
8. Count of Unique values
9. Null values
Total null values for a column
10. Pandas Profiling
To generate profile report of DataFrame use pandas-profiling. The profile contains Overview, Variable details, Pearson and Spearman Correlations helping in quick analysis of data.