You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by William <yh...@163.com> on 2016/08/01 09:13:10 UTC

Useless Delete operations in global index update

Hi all,
   I found an interesting problem in global secondary index update. I am using Phoenix 4.6.1. See the following SQLs:


   create table tt (pk1 integer not null, pk2 varchar not null, cf.a integer, cf.b varchar constraint pk primary key (pk1, pk2));
   create index tta on tt(cf.a);
   create index ttb on tt(cf.b);


   Now we have a data table TT and its index tables TTA and TTB. Do some UPSERT into TT.
   
   upsert into tt values (1, '#1', 100, '111');


   For this UPSERT statement, I expect one Scan and one Put operation executed on the data table TT, and one Put for each index table. But the operations actually took place were different.
   For TT, two Scans were executed, one for CF.A and another for CF.B. I think this should be reduced to one. 
   And for each index table, a useless Delete was executed.  Because when doing the UPSERT, the data table was empty. So there was nothing to delete in the index tables. 
   I added some log in the code and dumped the Delete objects. 


  {"ts":1470037773335,"totalColumns":0,"families":{},"row":"\\x00\\x80\\x00\\x00\\x01#1"}


 '\\x80\\x00\\x00\\x01' is PK1 and '#1' is PK2, then '\\x00' in the beginning must be CF.A. As we cannot get a value for CF.A, then just use 0 instead? 
  Why are we doing this? Why not ignoring the Delete if we cannot get the old state?


 A direct result of this behavior is that the first region of the index table handle more than 10 times requests, as it handles all the delete requests itself if the scenario is  'append more and update less'.
This wastes a lot of RPC calls and extends the RT of the data table UPSERT operation.
Might anyone explain this for me? Thanks.


William.