You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Stamatis Zampetakis (JIRA)" <ji...@apache.org> on 2019/04/04 13:59:00 UTC

[jira] [Commented] (CALCITE-2979) Add a block-based nested loop join algorithm

    [ https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809884#comment-16809884 ] 

Stamatis Zampetakis commented on CALCITE-2979:
----------------------------------------------

Implementation wise the first idea that comes to my mind is very similar to how the current single value implementation works. Instead of creating a filter with one correlated variable on the right side, we should create a filter with either:
 * BLK_SIZE correlated variables;
 * a single correlated variable of ARRAY type and BLK_SIZE;

that are combined with an OR predicate with BLK_SIZE disjunctions.

For example, starting with the following pseudo-plan:
{noformat}
Join(A.id > B.id)
  Scan(A)
  Scan(B)
{noformat}
the rules should generate something like the plans below:
{noformat}
NestedLoop(blockSize=3)
  Scan(A)
  Filter(OR(>(cor0_0,B.id), >(cor0_1,B.id), >(cor0_2,B.id))
    Scan(B)
{noformat}
or
{noformat}
NestedLoop(blockSize=3)
  Scan(A)
  Filter(OR(>(cor0[0],B.id), >(cor0[1],B.id), >(cor0[2],B.id))
    Scan(B)
{noformat}
which the implementation of Correlate(blockSize=3) should take into account.

> Add a block-based nested loop join algorithm
> --------------------------------------------
>
>                 Key: CALCITE-2979
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2979
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>              Labels: performance
>
> Currently Calcite provides a tuple-based nested loop join algorithm implmented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. This means that for each tuple of the outer relation we probe (set variables) in the inner relation. 
> The goal of this issue is to add new algorithm (or extend the correlateJoin method) which first gathers blocks (batches) of tuples from the outer relation and then probes the inner relation once per block. 
> There are cases (eg., indexes) where the inner relation can be accessed  by more than one value which can greatly improve the performance in particular when the outer relation is big. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)