You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "David Wayne Birdsall (JIRA)" <ji...@apache.org> on 2015/10/06 23:40:27 UTC

[jira] [Comment Edited] (TRAFODION-882) LP Bug: 1409937 - Following update statistics, stats do not take effect immediately.

    [ https://issues.apache.org/jira/browse/TRAFODION-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945840#comment-14945840 ] 

David Wayne Birdsall edited comment on TRAFODION-882 at 10/6/15 9:40 PM:
-------------------------------------------------------------------------

It appears to have something to do with the histogram cache (class HistogramCache in optimizer/NATable.cpp). There are histograms for the tables in question in the cache, and they won't be refreshed until HIST_NO_STATS_REFRESH_INTERVAL seconds have passed. (The default for this CQD is 3600, or 1 hour.) 

An interesting question is whether UPDATE STATISTICS itself tries to invalidate the Histogram cache. If it does, perhaps there is a bug in that logic. If it doesn't, perhaps we can change it so that it does. Even so, had we started Session2 in this JIRA and done the prepare just before doing the Update Statistics in Session1, we would have seen out of date stats there for an hour. Unless, perhaps, the RMS subsystem propogates an UPDATE STATISTICS notification somehow. These are the next avenues of research.


was (Author: davebirdsall):
It appears to have something to do with the histogram cache (class HistogramCache in optimizer/NATable.cpp). There are histograms for the tables in question in the cache, and they won't be refreshed until HIST_NO_STATS_REFRESH_INTERVAL seconds have passed. (The default for this CQD is 3600, or 1 hour.) Evidently, not a complete histogram set is in cache. If we prepare a statement with "where t1_c3 > 7000", for example, the cache gets updated and we have correct stats after that.

An interesting question is whether UPDATE STATISTICS itself tries to invalidate the Histogram cache. If it does, perhaps there is a bug in that logic. If it doesn't, perhaps we can change it so that it does. Even so, had we started Session2 in this JIRA and done the prepare just before doing the Update Statistics in Session1, we would have seen out of date stats there for an hour. Unless, perhaps, the RMS subsystem propogates an UPDATE STATISTICS notification somehow. These are the next avenues of research.

> LP Bug: 1409937 - Following update statistics, stats do not take effect immediately.
> ------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-882
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-882
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>            Reporter: Julie Thai
>            Assignee: David Wayne Birdsall
>            Priority: Critical
>             Fix For: 2.0-incubating
>
>
> On 20150106 & 20150110 builds, immediately following an updatestats, query plan generated does not seem to reflect the existence of stats. In a session, create, & populate table, run updatestats, prepare query and exit. A serial plan is generated and est cardinality is 100 for both tables. In a new session, prepare the same query and a parallel plan is generated, where est cardinality reflects stats.
> FIRST SESSION:
> >>create table tbl1( 
> +>   t1_c1 integer unsigned
> +>   ,t1_c2 char(10) character set iso88591
> +>   ,t1_c3 largeint
> +>   ,t1_c4 int
> +>   ,t1_c5 int
> +>   ,t1_id1 int not null
> +>   ,t1_id2 int not null) 
> +>hash partition 
> +>store by (t1_id1)
> +>salt using 8 partitions
> +>;
> --- SQL operation complete.
> >>load into tbl1
> +>select c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> +>    ,cast(c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000 as char(10) character set iso88591)
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> +>    ,c1
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> +>from (values(1)) t
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c1
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c2
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c3
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c4
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c5
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c6
> +>;
> Task: LOAD             Status: Started    Object: TRAFODION.ASCH.TBL1
> Task:  CLEANUP         Status: Started    Object: TRAFODION.ASCH.TBL1
> Task:  CLEANUP         Status: Ended      Object: TRAFODION.ASCH.TBL1
> Task:  DISABLE INDEXE  Status: Started    Object: TRAFODION.ASCH.TBL1
> Task:  DISABLE INDEXE  Status: Ended      Object: TRAFODION.ASCH.TBL1
> Task:  PREPARATION     Status: Started    Object: TRAFODION.ASCH.TBL1
>        Rows Processed: 1000000 
> Task:  PREPARATION     Status: Ended      ET: 00:00:23.512
> Task:  COMPLETION      Status: Started    Object: TRAFODION.ASCH.TBL1
> Task:  COMPLETION      Status: Ended      ET: 00:00:01.564
> Task:  POPULATE INDEX  Status: Started    Object: TRAFODION.ASCH.TBL1
> Task:  POPULATE INDEX  Status: Ended      ET: 00:00:03.479
> --- 1000000 row(s) loaded.
> >>
> >>update tbl1 set t1_c1=7777 where t1_id1 between 3000 and 6999;
> --- 4000 row(s) updated.
> >>update tbl1 set t1_c1=7777 where t1_id1 between 205000 and 206999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c1=7777 where t1_id1 between 410000 and 411999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c1=7777 where t1_id1 between 615000 and 615999;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c1=7777 where t1_id1 between 820000 and 820999;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c2='7777' where t1_id1 between 2500 and 6499;
> --- 4000 row(s) updated.
> >>update tbl1 set t1_c2='7777' where t1_id1 between 204000 and 205999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c2='7777' where t1_id1 between 411000 and 412999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c2='7777' where t1_id1 between 614500 and 615499;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c2='7777' where t1_id1 between 820500 and 821499;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c3=888 where t1_id1 between 2000 and 7999;
> --- 6000 row(s) updated.
> >>update tbl1 set t1_c3=888 where t1_id1 between 205000 and 205999;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c3=888 where t1_id1 between 410500 and 411499;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c3=888 where t1_id1 between 615500 and 616499;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c3=888 where t1_id1 between 820500 and 821499;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c4=999 where t1_id1 between 3000 and 6999;
> --- 4000 row(s) updated.
> >>update tbl1 set t1_c4=999 where t1_id1 between 205000 and 206999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c4=999 where t1_id1 between 410000 and 411999;
> --- 2000 row(s) updated.
> >>update tbl1 set t1_c4=999 where t1_id1 between 615000 and 615999;
> --- 1000 row(s) updated.
> >>update tbl1 set t1_c4=999 where t1_id1 between 820000 and 820999;
> --- 1000 row(s) updated.
> >>
> >>create table tbl2(
> +>   t2_c1 integer unsigned
> +>   ,t2_c2 char(10) character set iso88591 
> +>   ,t2_c3 largeint 
> +>   ,t2_c4 int
> +>   ,t2_c5 int
> +>   ,t2_id1 int not null
> +>   ,t2_id2 int not null)
> +>store by (t2_id1)
> +>salt using 8 partitions
> +>;
> --- SQL operation complete.
> >>load into tbl2
> +>select c1+c2*10+c3*100+c4*1000+c5*10000
> +>    ,cast(c1+c2*10+c3*100+c4*1000+c5*10000 as char(10) character set iso88591)
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000
> +>    ,c1
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000
> +>    ,c1+c2*10+c3*100+c4*1000+c5*10000
> +>from (values(1)) t
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c1
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c2
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c3
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c4
> +>transpose 0,1,2,3,4,5,6,7,8,9 as c5
> +>;
> Task: LOAD             Status: Started    Object: TRAFODION.ASCH.TBL2
> Task:  CLEANUP         Status: Started    Object: TRAFODION.ASCH.TBL2
> Task:  CLEANUP         Status: Ended      Object: TRAFODION.ASCH.TBL2
> Task:  DISABLE INDEXE  Status: Started    Object: TRAFODION.ASCH.TBL2
> Task:  DISABLE INDEXE  Status: Ended      Object: TRAFODION.ASCH.TBL2
> Task:  PREPARATION     Status: Started    Object: TRAFODION.ASCH.TBL2
>        Rows Processed: 100000 
> Task:  PREPARATION     Status: Ended      ET: 00:00:11.232
> Task:  COMPLETION      Status: Started    Object: TRAFODION.ASCH.TBL2
> Task:  COMPLETION      Status: Ended      ET: 00:00:01.051
> Task:  POPULATE INDEX  Status: Started    Object: TRAFODION.ASCH.TBL2
> Task:  POPULATE INDEX  Status: Ended      ET: 00:00:02.736
> --- 100000 row(s) loaded.
> >>update statistics for table tbl2 on every column;
> --- SQL operation complete.
> >>
> >>update statistics for table tbl1 on every column,
> +>(t1_c1, t1_c2), (t1_c1, t1_c4), (t1_c2, t1_c3),
> +>(t1_c3, t1_c4), (t1_c2, t1_c4)
> +>;
> --- SQL operation complete.
> >>
> >>prepare XX from select count(*) from tbl1, tbl2 where t1_c1 = t2_c1 and t1_c2 = t2_c2;
> --- SQL command prepared.
> >>
> >>explain options 'f' XX;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 4    .    5    root                                                  1.00E+000
> 3    .    4    sort_scalar_aggr                                      1.00E+000
> 2    1    3    hybrid_hash_join                                      2.50E+003
> .    .    2    trafodion_scan                  TBL1                  1.00E+002
> .    .    1    trafodion_scan                  TBL2                  1.00E+002
> --- SQL operation complete.
> >>exit;
> SECOND SESSION:
> >>set schema asch;
> --- SQL operation complete.
> >>prepare XX from select count(*) from tbl1, tbl2 where t1_c1 = t2_c1 and t1_c2 = t2_c2;
> --- SQL command prepared.
> >>
> >>explain options 'f' XX;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 8    .    9    root                                                  1.00E+000
> 7    .    8    sort_partial_aggr_ro                                  1.00E+000
> 6    .    7    esp_exchange                    1:12(hash2)           1.00E+000
> 5    .    6    sort_partial_aggr_le                                  1.00E+000
> 4    2    5    hybrid_hash_join                                      1.00E+005
> 3    .    4    esp_exchange                    12(hash2):8(hash2)    1.00E+006
> .    .    3    trafodion_scan                  TBL1                  1.00E+006
> 1    .    2    esp_exchange                    12(hash2):8(hash2)    1.00E+005
> .    .    1    trafodion_scan                  TBL2                  1.00E+005
> --- SQL operation complete.
> >>
> >>exit;
> To reproduce:
> step 1:   initiate a sqlci session
> drop schema asch cascade;
> create schema asch;
> set schema asch;
> drop table tbl1;
> drop table tbl2;
> -- [a1] fact table; c1,c2,c3,c4 skewed
> create table tbl1( 
>    t1_c1 integer unsigned
>    ,t1_c2 char(10) character set iso88591
>    ,t1_c3 largeint
>    ,t1_c4 int
>    ,t1_c5 int
>    ,t1_id1 int not null
>    ,t1_id2 int not null) 
> hash partition 
> store by (t1_id1)
> salt using 8 partitions
> ;
> load into tbl1
> select c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
>     ,cast(c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000 as char(10) character set iso88591)
>     ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
>     ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
>     ,c1
>     ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
>     ,c1+c2*10+c3*100+c4*1000+c5*10000+c6*100000
> from (values(1)) t
> transpose 0,1,2,3,4,5,6,7,8,9 as c1
> transpose 0,1,2,3,4,5,6,7,8,9 as c2
> transpose 0,1,2,3,4,5,6,7,8,9 as c3
> transpose 0,1,2,3,4,5,6,7,8,9 as c4
> transpose 0,1,2,3,4,5,6,7,8,9 as c5
> transpose 0,1,2,3,4,5,6,7,8,9 as c6
> ;
> update tbl1 set t1_c1=7777 where t1_id1 between 3000 and 6999;
> update tbl1 set t1_c1=7777 where t1_id1 between 205000 and 206999;
> update tbl1 set t1_c1=7777 where t1_id1 between 410000 and 411999;
> update tbl1 set t1_c1=7777 where t1_id1 between 615000 and 615999;
> update tbl1 set t1_c1=7777 where t1_id1 between 820000 and 820999;
> update tbl1 set t1_c2='7777' where t1_id1 between 2500 and 6499;
> update tbl1 set t1_c2='7777' where t1_id1 between 204000 and 205999;
> update tbl1 set t1_c2='7777' where t1_id1 between 411000 and 412999;
> update tbl1 set t1_c2='7777' where t1_id1 between 614500 and 615499;
> update tbl1 set t1_c2='7777' where t1_id1 between 820500 and 821499;
> update tbl1 set t1_c3=888 where t1_id1 between 2000 and 7999;
> update tbl1 set t1_c3=888 where t1_id1 between 205000 and 205999;
> update tbl1 set t1_c3=888 where t1_id1 between 410500 and 411499;
> update tbl1 set t1_c3=888 where t1_id1 between 615500 and 616499;
> update tbl1 set t1_c3=888 where t1_id1 between 820500 and 821499;
> update tbl1 set t1_c4=999 where t1_id1 between 3000 and 6999;
> update tbl1 set t1_c4=999 where t1_id1 between 205000 and 206999;
> update tbl1 set t1_c4=999 where t1_id1 between 410000 and 411999;
> update tbl1 set t1_c4=999 where t1_id1 between 615000 and 615999;
> update tbl1 set t1_c4=999 where t1_id1 between 820000 and 820999;
> create table tbl2(
>    t2_c1 integer unsigned
>    ,t2_c2 char(10) character set iso88591 
>    ,t2_c3 largeint 
>    ,t2_c4 int
>    ,t2_c5 int
>    ,t2_id1 int not null
>    ,t2_id2 int not null)
> store by (t2_id1)
> salt using 8 partitions
> ;
> load into tbl2
> select c1+c2*10+c3*100+c4*1000+c5*10000
>     ,cast(c1+c2*10+c3*100+c4*1000+c5*10000 as char(10) character set iso88591)
>     ,c1+c2*10+c3*100+c4*1000+c5*10000
>     ,c1+c2*10+c3*100+c4*1000+c5*10000
>     ,c1
>     ,c1+c2*10+c3*100+c4*1000+c5*10000
>     ,c1+c2*10+c3*100+c4*1000+c5*10000
> from (values(1)) t
> transpose 0,1,2,3,4,5,6,7,8,9 as c1
> transpose 0,1,2,3,4,5,6,7,8,9 as c2
> transpose 0,1,2,3,4,5,6,7,8,9 as c3
> transpose 0,1,2,3,4,5,6,7,8,9 as c4
> transpose 0,1,2,3,4,5,6,7,8,9 as c5
> ;
> update statistics for table tbl2 on every column;
> update statistics for table tbl1 on every column,
> (t1_c1, t1_c2), (t1_c1, t1_c4), (t1_c2, t1_c3),
> (t1_c3, t1_c4), (t1_c2, t1_c4)
> ;
> prepare XX from select count(*) from tbl1, tbl2 where t1_c1 = t2_c1 and t1_c2 = t2_c2;
> explain options 'f' XX;
> exit;
> step 2:   Initiate a new sqlci session and issue:
> set schema asch;
> prepare XX from select count(*) from tbl1, tbl2 where t1_c1 = t2_c1 and t1_c2 = t2_c2;
> explain options 'f' XX;
> step 3: compare query plans generated in step 1 and 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)