You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Konstantin Orlov (Jira)" <ji...@apache.org> on 2023/03/01 07:26:00 UTC
[jira] [Updated] (IGNITE-18225) Sql. Pushdown MODIFY to data node

     [ https://issues.apache.org/jira/browse/IGNITE-18225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Orlov updated IGNITE-18225:
--------------------------------------
    Description: 
Currently, ModifyNode can only have distribution "single". This means that this node will be executed on a single node, and the input should be gathered at one place. Assume the following query: UPDATE t SET a = a + 1. Such a query will be executed in 2 steps: first we select the rows to update and then do the update. Having a ModifyNode as "single" will result in sending all rows of table T to the reducer, and then send updated version of rows back to the data nodes.

We could eliminate this round trip by pushing down the ModifyNode (i.e. allowing this node to have distribution matching the distribution of modifying table).

Two approaches come to my mind:
 * as with aggregates, we can introduce 2 physical version of a logical modify: SingleModify (NB: not colocated!) and Map- + ReduceModify (I hope the rest of the necessary changes are clear)
 * make the ModifyNode to have the same distribution as modifying table. In that case we need to put SUM aggregate on top of ModifyNode to reduce an outcome.

Personally, I would prefer to stick with the second option, because in that case we can get rid of {{FragmentMapping#updatingTableAssignments()}} which was introduced more like a hack.

  was:
Having plan tree we easily can check whether a final modification may be executed on data nodes directly or not. We should implement such kind of optimization.

Proposed solution is to pushdown MODIFY to under exchange, and add a single SUM aggregate on top to reduce the result.


> Sql. Pushdown MODIFY to data node
> ---------------------------------
>
>                 Key: IGNITE-18225
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18225
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Konstantin Orlov
>            Priority: Major
>              Labels: ignite-3
>
> Currently, ModifyNode can only have distribution "single". This means that this node will be executed on a single node, and the input should be gathered at one place. Assume the following query: UPDATE t SET a = a + 1. Such a query will be executed in 2 steps: first we select the rows to update and then do the update. Having a ModifyNode as "single" will result in sending all rows of table T to the reducer, and then send updated version of rows back to the data nodes.
> We could eliminate this round trip by pushing down the ModifyNode (i.e. allowing this node to have distribution matching the distribution of modifying table).
> Two approaches come to my mind:
>  * as with aggregates, we can introduce 2 physical version of a logical modify: SingleModify (NB: not colocated!) and Map- + ReduceModify (I hope the rest of the necessary changes are clear)
>  * make the ModifyNode to have the same distribution as modifying table. In that case we need to put SUM aggregate on top of ModifyNode to reduce an outcome.
> Personally, I would prefer to stick with the second option, because in that case we can get rid of {{FragmentMapping#updatingTableAssignments()}} which was introduced more like a hack.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)