You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Pradheep Shanmugam <Pr...@infor.com> on 2019/07/17 15:40:48 UTC

Using Secondary index hint for many to many relationship model

Hi,

We have a table with with parented and childid combination as unique. We have to query it sometimes with parent id and sometimes with child id. So we have secondary index(global index) on child id. As we don’t want to store all the fields again in the secondary index , we wanted to use index hinting. Also we do pre-splitting on the first field of table and as well as secondary index to distribute the write. We pre-calculate Secondary_Dist_salt and write to a column in primary table which is used to populate the index.

Table(TEST)

Primary_Dist_salt
Parentid
Childid

Other columns

Secondary Index: (TESTINDEX)

Secondary_Dist_salt
Childid
Parentid

If we do a query like below, I get below plan I see that the index is used, I am worried if the full scan on primary table will affect performance in future as the table may grow to several hundreds of millions.
Also I see that Primary_Dist_salt is also added implicitly when I create the secondary index without it. So I thought we have all the keys required to do a point lookup on primary table to get rest of the columns.
I presume that the range scan runs parallel and hence full scan occurs?
Can you please throw some light on this? Is there some way we can avoid that full scan.

explain select /*+ INDEX(TEST TESTINDEX) */ * from TEST where Childid
='10' and Secondary_dist_salt='3k';

CLIENT 10-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST
SKIP-SCAN-JOIN TABLE 0
CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER TESTINDEX ['3k','10']
SERVER FILTER BY FIRST KEY ONLY
DYNAMIC SERVER FILTER BY ("TEST.PRIMARY_DIST_SALT", "TEST.PARENTID", "TEST.CHILDID") IN (($2.$4, $2.$5, $2.$6)

Thanks,
Pradheep