You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/03/23 23:18:41 UTC

[jira] [Commented] (TRAFODION-2552) Skew buster plan not chosen when join predicate involves SUBSTRs

    [ https://issues.apache.org/jira/browse/TRAFODION-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939417#comment-15939417 ] 

ASF GitHub Bot commented on TRAFODION-2552:
-------------------------------------------

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/1023

    [TRAFODION-2552] Use column stats when SUBSTR takes a column prefix

    Scenario: A join on two tables, on a column that is badly skewed in one of the tables. The join predicate is of the form SUBSTR(skewedcolumn,1,n) = SUBSTR(othercolumn,1,n). We use the default distribution for this predicate, so we miss an opportunity to use skew buster.
    
    The fix simply allows the base table column statistics to be used in this case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion SubstringStats

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/1023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1023
    
----

----


> Skew buster plan not chosen when join predicate involves SUBSTRs
> ----------------------------------------------------------------
>
>                 Key: TRAFODION-2552
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2552
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating, 2.2-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> Skew-buster is a feature of Trafodion that modifies the partitioning function on joins when skewed values are present. When we notice that there are a few very frequent values among many low frequency values in a join column, we use a different strategy for joining the frequent values than the non-frequent ones.
> The following script reproduces the problem:
> ?section setup
> drop table if exists skewSubstr;
> create table skewSubstr (a int not null, b char(16), primary key (a))
>   salt using 4 partitions;
> upsert using load into skewSubstr
> select 
>   x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
>   case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0 then 'askewvalue'
>        else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97) || char(x1+97) || 
>             char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) || char(x6+97) ||
>             char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97) 
>        end
> -- the from clause below creates 1,000,000 rows, the cross product of
> -- 6 copies of { 0, ... 9 }
>   from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
> transpose 0,1,2,3,4,5,6,7,8,9 as x2
> transpose 0,1,2,3,4,5,6,7,8,9 as x3
> transpose 0,1,2,3,4,5,6,7,8,9 as x4
> transpose 0,1,2,3,4,5,6,7,8,9 as x5
> transpose 0,1,2,3,4,5,6,7,8,9 as x6;
> update statistics for table skewSubtr on every column;
> create table otherTable(c int not null, d char(16), primary key (c))
>   salt using 4 partitions;
> upsert using load into otherTable
>   select a, case when b = 'askewvalue' then substr(b,1,10) || char(mod(a,26)+97) else b end
>     from skewSubstr;
> update statistics for table otherTable on every column;
> ?section doit
> cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> cqd SKEW_ROWCOUNT_THRESHOLD '10';
> prepare goodQuery from select count(*) from skewSubstr join otherTable
>   on b = d;
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> explain options 'f' goodQuery;
> prepare badQuery from select count(*) from skewSubstr join otherTable
>   on substr(b,1,9) = substr(d,1,9);
> -- notice in the explain, that the descriptions on the esp_exchanges are
> -- (hash2), which indicates that skew-buster is NOT operative
> explain options 'f' badQuery;
> To see the problem, look at the output of the EXPLAINs:
> >>
> >>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
> --- SQL operation complete.
> >>cqd SKEW_ROWCOUNT_THRESHOLD '10';
> --- SQL operation complete.
> >>
> >>prepare goodQuery from select count(*) from skewSubstr join otherTable
> +>  on b = d;
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
> >>explain options 'f' goodQuery;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 8    .    9    root                                                  1.00E+000
> 7    .    8    sort_partial_aggr_ro                                  1.00E+000
> 6    .    7    esp_exchange                    1:4(h2-ud)            1.00E+000
> 5    .    6    sort_partial_aggr_le                                  1.00E+000
> 4    2    5    hybrid_hash_join                                      1.01E+006
> 3    .    4    esp_exchange                    4(h2-ud):4(hash2)     1.00E+006
> .    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
> 1    .    2    esp_exchange                    4(h2-br):4(hash2)     1.00E+006
> .    .    1    trafodion_scan                  OTHERTABLE            1.00E+006
> --- SQL operation complete.
> >>
> >>prepare badQuery from select count(*) from skewSubstr join otherTable
> +>  on substr(b,1,9) = substr(d,1,9);
> --- SQL command prepared.
> >>
> >>-- notice in the explain, that the descriptions on the esp_exchanges are
> >>-- (hash2), which indicates that skew-buster is NOT operative
> >>explain options 'f' badQuery;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 8    .    9    root                                                  1.00E+000
> 7    .    8    sort_partial_aggr_ro                                  1.00E+000
> 6    .    7    esp_exchange                    1:4(hash2)            1.00E+000
> 5    .    6    sort_partial_aggr_le                                  1.00E+000
> 4    2    5    hybrid_hash_join                                      1.79E+006
> 3    .    4    esp_exchange                    4(hash2):4(hash2)     1.00E+006
> .    .    3    trafodion_scan                  SKEWSUBSTR            1.00E+006
> 1    .    2    esp_exchange                    4(hash2):4(hash2)     1.00E+006
> .    .    1    trafodion_scan                  OTHERTABLE            1.00E+006
> --- SQL operation complete.
> Notice that when the join predicate is of the form SUBSTR(b,1,9) = SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values of column b show extreme skew. The problem is that with the SUBSTRs present, we use default distribution properties for the join predicate instead of leveraging our knowledge about the skew in column B.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)