You are viewing a plain text version of this content. The canonical link for it is here.

Posted to codereview@trafodion.apache.org by DaveBirdsall <gi...@git.apache.org> on 2018/04/19 23:06:13 UTC

[GitHub] trafodion pull request #1530: [TRAFODION-3031] Fix two issues with nested su...

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/trafodion/pull/1530

    [TRAFODION-3031] Fix two issues with nested subquery plans

    See the JIRA for a discussion of the problem.
    
    This set of changes contains two fixes:
    
    1. The heuristics that decide whether to do the semi-join to join + group-by have been modified. First, if the group by would reduce the number of rows in the inner table by a factor of 5.0 or more, the transformation will be enabled. The rationale is that this may reduce the size of the inner hash table if hash join is chosen. Also it allows the join to be commuted, which may lead to a better plan. Second, if the inner table has only a small number of rows (less than 100.0), the transformation will be enabled. The rationale here is that it allows the join to be commuted which may lead to an efficient nested join plan. Both of the constants here are controlled by new CQDs. (Note: Thanks go to @sureshsubbiah for the first of these ideas and the code to do it.)
    
    2. There was a bug in Scan::addIndexInfo, where we would sometimes overlook a useful index. We would mistakenly think that another index provided the same or better information when in fact one index would satisfy a join predicate that the other would not. This bug has been fixed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/trafodion NestedSubqueryProblem

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1530
    
----
commit ee91b337ae1d23ad2b15034602e06c4d3bf1103d
Author: Dave Birdsall <db...@...>
Date:   2018-04-19T22:59:16Z

    [TRAFODION-3031] Fix two issues with nested subquery plans

----


---

[GitHub] trafodion pull request #1530: [TRAFODION-3031] Fix two issues with nested su...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/trafodion/pull/1530


---

[GitHub] trafodion pull request #1530: [TRAFODION-3031] Fix two issues with nested su...

Posted by zellerh <gi...@git.apache.org>.

Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1530#discussion_r182912167
  
    --- Diff: core/sql/optimizer/NormRelExpr.cpp ---
    @@ -2826,18 +2827,39 @@ RelExpr* Join::transformSemiJoin(NormWA& normWARef)
     
      /* Apply the transformation described in item b) above.
        The transformation below is done if there are no non-equijoin preds either 
    -  and the inner side has no base tables (i.e. is an IN LIST) or if we have
    -  used a CQD to turn this transformation on for a specific user. For the general
    -  case we are not certain if this transformation is always beneficial, so it is 
    -  not on by default */
    +  and the inner side has no base tables (i.e. is an IN LIST) OR if the groupby
    +  is expected to provide a reduction > SEMIJOIN_TO_INNERJOIN_REDUCTION_RATIO
    +  (default is 5.0) OR the inner row count is small OR if we have used a CQD to 
    +  turn this transformation on. Some rationale: A data reduction might reduce
    +  the amount of data for the inner table of a hash join (or it might not!
    +  hash-semi-join sometimes does duplicate elimination itself, but not always).
    +  Converting to a join allows the join to be commuted; if the number of rows
    +  is small, nested join might be profitably chosen in that case. */
     
           ValueIdSet preds ;
           preds += joinPred();
           preds += selectionPred();
           preds -= getEquiJoinPredicates() ;
     
    +      EstLogPropSharedPtr innerEstLogProp = child(1)->getGroupAttr()->outputLogProp((*GLOBAL_EMPTY_INPUT_LOGPROP));
    +      CostScalar innerRowCount = innerEstLogProp->getResultCardinality(); 
    +      CostScalar innerUec = innerEstLogProp->getAggregateUec(equiJoinCols1);
    --- End diff --
    
    One question about this: This UEC can be very unreliable, if we don't have multi-column statistics. So, could it happen that we have a semi-join with a very large inner table, where our UEC is underestimated? The extra groupby could be a bad choice in that situation.
    
    On the other hand, I think we usually overestimate multi-column UECs, so maybe this is not so much of a concern.


---

[GitHub] trafodion pull request #1530: [TRAFODION-3031] Fix two issues with nested su...

Posted by zellerh <gi...@git.apache.org>.

Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1530#discussion_r182911028
  
    --- Diff: core/sql/optimizer/NormRelExpr.cpp ---
    @@ -2767,25 +2767,26 @@ Here t2.a is a unique key of table t2.
     The following transformation is made
              Semi Join {pred : t1.b = t2.a}          Join {pred : t1.b = t2.a} 
             /         \                   ------->  /    \
    -      /             \                         /        \
    -Scan t1     Scan t2                 Scan t1     Scan t2
    +       /           \                           /      \
    + Scan t1        Scan t2                   Scan t1     Scan t2
                                                     
     
     						
     b) If the right child is not unique in the joining column then 
     we transform the semijoin into an inner join followed by a groupby
     as the join's right child. This transformation is enabled by default
    -only if the right side is an IN list, otherwise a CQD has to be used.
    +only if the right side is an IN list or if the groupby's reduction 
    +ratio is greater than 5.0, otherwise a CQD has to be used.
     
     select t1.a
     from t1
     where t1.b in (1,2,3,4,...,101) ;
     
     
    -  Semi Join {pred : t1.b = t2.a}          Join {pred : t1.b = InList.col} 
    +  Semi Join {pred : t1.b = InList.col}  Join {pred : t1.b = InList.col}
      /         \                   ------->  /    \
     /           \                           /      \
    -Scan t1     Scan t2                 Scan t1     GroupBy {group cols: InList.col}
    +Scan t1   TupleList                 Scan t1   GroupBy {group cols: InList.col}
                                                       |
    --- End diff --
    
    Nice to make the picture consistent, but from the code it looks like we do this for things other than TupleList, so maybe "Scan t2" or "Q2" would be a better name for the child?


---