You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Lai Zhou (JIRA)" <ji...@apache.org> on 2019/04/02 09:23:00 UTC

[jira] [Comment Edited] (CALCITE-2973) Make EnumerableMergeJoinRule to support a theta join

    [ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807393#comment-16807393 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:22 AM:
-----------------------------------------------------------

[~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile :

 
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
 Merge join is  also good for this query. But now it will be converted to a nested loop join.

 I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
  // EnumerableJoinRel only supports equi-join. We can put a filter on top
  // if it is an inner join.
  try {
    boolean hasEquiKeys = !info.leftKeys.isEmpty()
        && !info.rightKeys.isEmpty();
    if (hasEquiKeys) {
      return convertToThetaMergeJoin(rel);
    } else {
      return new EnumerableThetaJoin(cluster, traitSet, left, right,
          join.getCondition(), join.getVariablesSet(), join.getJoinType());
    }
  } catch (Exception e) {
    EnumerableRules.LOGGER.debug(e.toString());
    return null;
  }
}
{code}
 if the join has equi-keys, it will convert the join rel to a EnumerableThetaMergeJoin .
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code}
I implement the  EnumerableThetaMergeJoin to handle a theta join with equi keys .

The key difference of  EnumerableThetaMergeJoin and  EnumerableMergeJoin is that:

EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo,

and the  predicate will apply on the cartesians result  of a merge join.

see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298|https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]

 
{code:java}
public TResult current() {
  final List<Object> list = cartesians.current();
  @SuppressWarnings("unchecked") final TSource left =
      (TSource) list.get(0);
  @SuppressWarnings("unchecked") final TInner right =
      (TInner) list.get(1);
  //apply predicate for the result in cartesians
  boolean isNonEquiPredicateSatisfied=predicate.apply(left, right);
  if (!isNonEquiPredicateSatisfied) {
    if (generateNullsOnLeft) {
      return resultSelector.apply(null, right);
    }
    if (generateNullsOnRight) {
      return resultSelector.apply(left, null);
    }
  }
  return resultSelector.apply(left, right);
}
{code}
 

 


was (Author: hhlai1990):
[~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile :

 
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
 Merge join is  also good for this query. But now it will be converted to a nested loop join.

 

 

> Make EnumerableMergeJoinRule to support a theta join
> ----------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), the nested-loop join process will take dozens of time than the sort-merge join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)