You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Stanislav Lukyanov (Jira)" <ji...@apache.org> on 2021/01/08 13:55:00 UTC
[jira] [Commented] (IGNITE-8732) SQL: REPLICATED cache cannot be left-joined to PARTITIONED

    [ https://issues.apache.org/jira/browse/IGNITE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261318#comment-17261318 ] 

Stanislav Lukyanov commented on IGNITE-8732:
--------------------------------------------

[~Vodrážka] I have implemented a partial solution based on approach #2 (treat replicated cache as partitioned). Will be proceeding with PR and merge now.

> SQL: REPLICATED cache cannot be left-joined to PARTITIONED
> ----------------------------------------------------------
>
>                 Key: IGNITE-8732
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8732
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>    Affects Versions: 2.5
>            Reporter: Vladimir Ozerov
>            Assignee: Stanislav Lukyanov
>            Priority: Major
>              Labels: sql-engine
>
> *Steps to reproduce*
> # Run {{org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned}}
> # Observe that we have 2x results on 2-node cluster
> *Root Cause*
> {{left LEFT JOIN right ON cond}} operation assumes full scan of of a left expression. Currently we perform this scan on every node and then simply merge results on reducer. Two nodes, two scans of {{REPLICATED}} cache, 2x results.
> *Potential Solutions*
> We may consider several solutions. Deeper analysis is required to understand which is the right one.
> # Perform deduplication on reducer - this most prospective and general technique, described in more details below
> # Treat {{REPLICATED}} cache as {{PARTITIONED}}. Essentially, we just need to pass proper backup filter. But what if {{REPLICATED}} cache spans more nodes than {{PARTITIONED}}? We cannot rely on primary/backup in this case
> # Implement additional execution phase as follows: 
> {code}
> SELECT left.cols, right.cols FROM left INNER JOIN right ON cond;                          // Get "inner join" part
> UNION
> UNICAST SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids from the first phase]) // Get "outer join" part
> {code}
> *Reducer Deduplication*
> The idea is to get all data locally and then perform final deduplication. This may incur high network overhead, because of lot of duplicated left parts would be transferred. However, this could be optimized greatly with the following techniques applied one after another
> # Semi-jions: {{left}} is {{joined}} on mapper node, but instead of sending {{(left, right)}} relation, we send {{(left) + (right)}}
> # In case {{left}} part is known to be idempotent (i.e. it produces the same result set on all nodes), only one node will send {{(left) + (right)}}, other nodes will send {{(right)}} only
> # Merge {{left}} results with if needed (i.e. if idempotence-related opto was not applicable)
> # Join {{left}} and {{right}} parts on reducer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)