Python List

This blog post is about appending data elements to list in Python.

Suppose we have a simple list “x”, we will look at different ways to append elements to this list.

x = [1, 2, 3]

The “append” method appends only a single element

>>> x
[1, 2, 3]
>>> x.append(4)
>>> x
[1, 2, 3, 4]
>>>

>> x.append(5, 6, 7)
Traceback (most recent call last):
File "", line 1, in
TypeError: append() takes exactly one argument (3 given)
>>>

How do you append multiple elements to list?

>>> x[len(x):] = [5, 6, 7]
>>> x
[1, 2, 3, 4, 5, 6, 7]
>>>

Another way to append multiple elements is create a new list and use “+” operator

>>> y = [8, 9, 10]
>>> x = x + y
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>

The other way which I learned today is to use “extend” method

>>> z = [ 11, 12, 13]
>>> x.extend(z)
>>> x
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
>>>

Incase you want to add new list elements between two existing elements, then use “insert” method.

>>> x.insert(5, "Hi")
>>> x
[1, 2, 3, 4, 5, 'Hi', 6, 7, 8, 9, 10, 11, 12, 13]
>>>
Advertisements

PostgreSQL – CPU Utilization and Index

One of the Production Aurora PostgreSQL instance running on db.r4.16xlarge instance (64 vCPU and 488 GB ) was reporting high CPU Utilization spiking upto 100%.

Screen Shot 2018-12-09 at 11.14.49 AM

With such issues, one of the first thing is to look for the SQLs with high buffers shared hit. I have built a small tool called pgsnap which is something similar to AWR respostory in Oracle maintaining the SQL stat history.  So, with pg_stat_statements and hist_pg_stat_statements(that’s what I call it), I was able to identify the SQL.

select col1, col2, col3, col4, col5 from cltn_errs redis0_ where redis0_.sbmn_id=123456;

Lets look at the execution plan

Gather  (cost=1000.00..2070031.26 rows=2696 width=262) (actual time=17475.126..18771.216 rows=1 loops=1)                                                       
   Output: col1, col2, col3, col4, col5                                   
   Workers Planned: 7                                                                                                                                           
   Workers Launched: 0                                                                                                                                          
   Buffers: shared hit=3945515                                                                                                                            
   ->  Parallel Seq Scan on demo.cltn_errs redis0_  (cost=0.00..2068761.66 rows=385 width=262) (actual time=17474.807..18770.895 rows=1 loops=1) 
         Output: col1, col2, col3, col4, col5                              
         Filter: (redis0_.sbmn_id = '123456'::numeric)                                                                                        
         Rows Removed by Filter: 52390761                                                                                                                       
         Buffers: shared hit=3945515                                                                                                                            
 Planning time: 0.652 ms  
 Execution time: 18771.384 ms      

The problem was obvious!! Full table scan of table cltn_errs (~15Gb table). So, I restored the latest snapshot, created an index on “sbmn_id” column and the execution plan changed to

Index Scan using idx_sbmn_id on demo.cltn_errs redis0_  (cost=0.57..182.80 rows=3086 width=272) (actual time=0.031..0.032 rows=1 loops=1) 
   Output: col1, col2, col3, col4, col5                                           
   Index Cond: (redis0_.sbmn_id = '123456'::numeric)                                                                                                 
   Buffers: shared hit=5                                                                                                                                 
 Planning time: 0.573 ms                                                                                                                                               
 Execution time: 0.085 ms        

Wow!! After the index, Buffers: Shared hit and the total execution time has magnificent improvement, more than 100x. So, with this testing, I created the index on Prod and after this little change CPU Utilization graph had to say it all.

Screen Shot 2018-12-09 at 11.58.06 AM

Python – Flatten List of Lists

Itertools is one of the most powerful module in Python. Today I had requirement to flatten list of lists and itertools made it so easy.

My list —

>> val = [['a','b'],'c',['d','e','f']]

Required Result

['a', 'b', 'c', 'd', 'e', 'f']

How do you do it? Itertools to the resuce —

>>> list(chain.from_iterable(val))
['a', 'b', 'c', 'd', 'e', 'f']

So simple !!

Python — Enumerate

Suppose you have a dataset (data) and want to find every 5th item.  How would you do it?

data = [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

The first thing which could come to mind in using slice, but that won’t work as its based on index and index starts from 0.

>>> data[::5]
[1, 11]

The answer should be = [9, 19]

This is where enumerate comes in play. It allows us to loop over something and have an automatic counter.

>>> def get_nth_item(n=5):
...     new_data = []
...     for i, d in enumerate(data, 1):
...         if i % n == 0:
...             new_data.append(d)
...     print (new_data)
...
>>>
>>> get_nth_item(n=5)
[9, 19]
>>> get_nth_item(n=4)
[7, 15]
>>>

 

AWS DMS – Target TableName Differs

AWS DMS is a tool that supports both homogenous and heterogeneous migration, helping to migrate to aws cloud.

During most of the migrations, the source and target table names remain the same, in which case the Mappings.json file is pretty simple. As an example (Oracle to PostgreSQL)

 {
    "rules":
    [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator":
            {
                "schema-name": "DEVO",
                "table-name": "TEST_DEMO"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "transformation",
            "rule-id": "2",
            "rule-name": "2",
            "rule-action": "convert-lowercase",
            "rule-target": "schema",
            "object-locator":
            {
                "schema-name": "%"
            }
        },
        {
            "rule-type": "transformation",
            "rule-id": "3",
            "rule-name": "3",
            "rule-action": "convert-lowercase",
            "rule-target": "table",
            "object-locator":
            {
                "schema-name": "%",
                "table-name": "%"
            }
        },
        {
            "rule-type": "transformation",
            "rule-id": "4",
            "rule-name": "4",
            "rule-action": "convert-lowercase",
            "rule-target": "column",
            "object-locator":
            {
                "schema-name": "%",
                "table-name": "%",
                "column-name": "%"
            }
        }
    ]
}

The above Mappings.json, includes the table DEVO.TEST_DEMO to be migrated from Oracle to PostgreSQL, with the transformation of convert-lowercase for schema, table-name and column name.

But what if, the target table name is different, migrating from DEVO.TEST_DEMO to devo.test_demo_new. In such scenario, below Mappings.json can be used —

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "DEVO",
            "table-name": "TEST_DEMO"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "convert-lowercase",
          "rule-target": "schema",
          "object-locator": {
            "schema-name": "%"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": "3",
          "rule-name": "3",
          "rule-action": "rename",
          "rule-target": "table",
          "object-locator": {
            "schema-name": "devo",
            "table-name": "TEST_DEMO"
          },
          "value": "test_demo_new"
        },
        {
          "rule-type": "transformation",
          "rule-id": "4",
          "rule-name": "4",
          "rule-action": "convert-lowercase",
          "rule-target": "table",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": "5",
          "rule-name": "5",
          "rule-action": "convert-lowercase",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "devo",
            "table-name": "test_demo",
            "column-name": "%"
          }
        }
      ]
    }

PostgreSQL – Unique constraint and null value

An important behavior in PostgreSQL to know about is the duplicate null values do not violate unique constraints.

Oracle

SQL> create table test (id number (2,0), 
                        country varchar(20) not null, 
                        state varchar(20)
                       );

Table created.

SQL> alter table test add constraint pk_test_id primary key (id);

Table altered.

SQL> alter table test add constraint uniq_test_cs unique (country, state);

Table altered.

PostgreSQL

admin@test # create table test (id numeric(2,0), 
                                country character varying(20) not null, 
                                state character varying(20)
                                );
CREATE TABLE
Time: 78.866 ms

admin@test # alter table test add constraint pk_test_id primary key (id);
ALTER TABLE
Time: 81.062 ms

admin@test # alter table test add constraint uniq_test_cs unique (country, state);
ALTER TABLE
Time: 82.800 ms

Let look at the way how null is handle with unique constraint in place.

Behaviour in Oracle

SQL> insert into test values (1, 'USA','SEATTLE');

1 row created.

SQL>  insert into test values (2, 'USA', 'OREGON');

1 row created.

SQL> insert into test (id, country) values (3, 'USA');

1 row created.

SQL> select * from test;

        ID COUNTRY              STATE
---------- -------------------- --------------------
         1 USA                  SEATTLE
         2 USA                  OREGON
         3 USA

SQL> insert into test (id, country) values (4, 'USA');
insert into test (id, country) values (4, 'USA')
*
ERROR at line 1:
ORA-00001: unique constraint (DEV.UNIQ_TEST_CS) violated


Behaviour in PostgreSQL

admin@test # insert into test values (1, 'USA','SEATTLE');
INSERT 0 1
Time: 79.928 ms

admin@test # insert into test values (2, 'USA', 'OREGON');
INSERT 0 1
Time: 72.490 ms

admin@test # insert into test (id, country) values (3, 'USA');
INSERT 0 1
Time: 75.906 ms
admin@test # select * from test;
+----+---------+---------+
| id | country |  state  |
+----+---------+---------+
|  1 | USA     | SEATTLE |
|  2 | USA     | OREGON  |
|  3 | USA     | NULL    |
+----+---------+---------+
(3 rows)

Time: 74.928 ms
admin@test # insert into test (id, country) values (4, 'USA');
INSERT 0 1
Time: 76.040 ms
admin@test # select * from test;
+----+---------+---------+
| id | country |  state  |
+----+---------+---------+
|  1 | USA     | SEATTLE |
|  2 | USA     | OREGON  |
|  3 | USA     | NULL    |
|  4 | USA     | NULL    |
+----+---------+---------+
(4 rows)

Time: 83.693 ms
admin@test #

2 ways to handle such situation —

1. Set default value for the column, which will case “duplicate key value violates” error.

2. Create Partial index.

admin@test # create unique index uniq_test_c on test(country) where state is null;
ERROR:  could not create unique index "uniq_test_c"
DETAIL:  Key (country)=(USA) is duplicated.
Time: 254.043 ms

admin@test # delete from test where id in (3,4);
DELETE 2
Time: 77.395 ms
admin@test # select * from test;
+----+---------+---------+
| id | country |  state  |
+----+---------+---------+
|  1 | USA     | SEATTLE |
|  2 | USA     | OREGON  |
+----+---------+---------+
(2 rows)

Time: 77.451 ms

admin@test # create unique index uniq_test_c on test(country) where state is null;
CREATE INDEX
Time: 78.431 ms

admin@test # insert into test (id, country) values (3, 'USA');
INSERT 0 1
Time: 75.413 ms

admin@test # insert into test (id, country) values (4, 'USA');
ERROR:  duplicate key value violates unique constraint "uniq_test_c"
DETAIL:  Key (country)=(USA) already exists.
Time: 95.083 ms
admin@test #

As per PostgreSQL documentation

In general, a unique constraint is violated if there is more than one row in the table where the values of all of the columns included in the constraint are equal. However, two null values are never considered equal in this comparison. That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior conforms to the SQL standard, but we have heard that other SQL databases might not follow this rule. So be careful when developing applications that are intended to be portable