You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Lai Zhou (JIRA)" <ji...@apache.org> on 2019/04/02 09:23:00 UTC
[jira] [Comment Edited] (CALCITE-2973) Make EnumerableMergeJoinRule
to support a theta join
[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807393#comment-16807393 ]
Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:22 AM:
-----------------------------------------------------------
[~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile :
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
Merge join is also good for this query. But now it will be converted to a nested loop join.
I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule:
{code:java}
final JoinInfo info = JoinInfo.of(left, right, join.getCondition());
if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) {
// EnumerableJoinRel only supports equi-join. We can put a filter on top
// if it is an inner join.
try {
boolean hasEquiKeys = !info.leftKeys.isEmpty()
&& !info.rightKeys.isEmpty();
if (hasEquiKeys) {
return convertToThetaMergeJoin(rel);
} else {
return new EnumerableThetaJoin(cluster, traitSet, left, right,
join.getCondition(), join.getVariablesSet(), join.getJoinType());
}
} catch (Exception e) {
EnumerableRules.LOGGER.debug(e.toString());
return null;
}
}
{code}
if the join has equi-keys, it will convert the join rel to a EnumerableThetaMergeJoin .
{code:java}
new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code}
I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys .
The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is that:
EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo,
and the predicate will apply on the cartesians result of a merge join.
see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298|https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298]
{code:java}
public TResult current() {
final List<Object> list = cartesians.current();
@SuppressWarnings("unchecked") final TSource left =
(TSource) list.get(0);
@SuppressWarnings("unchecked") final TInner right =
(TInner) list.get(1);
//apply predicate for the result in cartesians
boolean isNonEquiPredicateSatisfied=predicate.apply(left, right);
if (!isNonEquiPredicateSatisfied) {
if (generateNullsOnLeft) {
return resultSelector.apply(null, right);
}
if (generateNullsOnRight) {
return resultSelector.apply(left, null);
}
}
return resultSelector.apply(left, right);
}
{code}
was (Author: hhlai1990):
[~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile :
{code:java}
SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code}
Merge join is also good for this query. But now it will be converted to a nested loop join.
> Make EnumerableMergeJoinRule to support a theta join
> ----------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000), the nested-loop join process will take dozens of time than the sort-merge join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)