You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/11/10 19:43:11 UTC

[I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

alamb opened a new issue, #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129

   ### Is your feature request related to a problem or challenge?
   
   Most (all?) nodes in DataFusion explain plans have the property that they are displayed with the same name as the `struct` that implements them, which makes matching explain plans to code easy
   
   However, a notable exception is `RepartitionExec` which is displayed sometimes as  `RepartitionExec` and sometimes as  `SortPreservingRepartitionExec` to signify it preserves the input sort order, which makes finding the relevant code harder (as there is no such code as `SortPreservingRepartitionExec`)
   
   
   For example in the following plan `RepartitionExec and `SortPreservingRepartitionExec` are the same ExecutionPlan node:
   
   ```
   +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan                                                                                                                                                                                                                                                                                                                                                  |
   +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | SortPreservingMergeExec: [cpu@2 ASC NULLS LAST,time@1 ASC NULLS LAST]                                                                                                                                                                                                                                                                                 |
   |   SortExec: expr=[cpu@2 ASC NULLS LAST,time@1 ASC NULLS LAST]                                                                                                                                                                                                                                                                                         |
   |     ProjectionExec: expr=[cpu as iox::measurement, time@0 as time, cpu@1 as cpu, coalesce(P@2 / 99.99, 0) as P]                                                                                                                                                                                                                                       |
   |       RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=16                                                                                                                                                                                                                                                                          |
   |         ProjectionExec: expr=[time@1 as time, cpu@2 as cpu, (selector_max(cpu.usage_system,cpu.time)@3).[value] as P]                                                                                                                                                                                                                                 |
   |           CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                 |
   |             FilterExec: AVG(cpu.usage_idle)@4 > 0.5 AND AVG(cpu.usage_idle)@4 < 1.5 AND (selector_max(cpu.usage_system,cpu.time))[value]selector_max(cpu.usage_system,cpu.time)@0 > 0 AND (selector_max(cpu.usage_system,cpu.time))[value]selector_max(cpu.usage_system,cpu.time)@0 < 1                                                               |
   |               ProjectionExec: expr=[(selector_max(cpu.usage_system,cpu.time)@2).[value] as (selector_max(cpu.usage_system,cpu.time))[value]selector_max(cpu.usage_system,cpu.time), time@0 as time, cpu@1 as cpu, selector_max(cpu.usage_system,cpu.time)@2 as selector_max(cpu.usage_system,cpu.time), AVG(cpu.usage_idle)@3 as AVG(cpu.usage_idle)] |
   |                 AggregateExec: mode=FinalPartitioned, gby=[time@0 as time, cpu@1 as cpu], aggr=[selector_max(cpu.usage_system,cpu.time), AVG(cpu.usage_idle)], ordering_mode=PartiallyOrdered                                                                                                                                                         |
   |                   CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                         |
   |                     SortPreservingRepartitionExec: partitioning=Hash([time@0, cpu@1], 16), input_partitions=16, sort_exprs=cpu@1 ASC                                                                                                                                                                                                                  |
   |                       AggregateExec: mode=Partial, gby=[date_bin(10000000000, time@1, 0) as time, cpu@0 as cpu], aggr=[selector_max(cpu.usage_system,cpu.time), AVG(cpu.usage_idle)], ordering_mode=PartiallyOrdered                                                                                                                                  |
   |                         RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1                                                                                                                                                                                                                                                         |
   |                           ProjectionExec: expr=[cpu@1 as cpu, time@3 as time, usage_idle@4 as usage_idle, usage_system@5 as usage_system]                                                                                                                                                                                                             |
   |                             DeduplicateExec: [cpu@1 ASC,host@2 ASC,time@3 ASC]                                                                                                                                                                                                                                                                        |
   |                               SortExec: expr=[cpu@1 ASC,host@2 ASC,time@3 ASC,__chunk_order@0 ASC]                                                                                                                                                                                                                                                    |
   |                                 CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                           |
   |                                   FilterExec: time@3 <= 1699620021099501000                                                                                                                                                                                                                                                                           |
   |                                     RecordBatchesExec: chunks=1, projection=[__chunk_order, cpu, host, time, usage_idle, usage_system]                                                                                                                                                                                                                |
   |                                                                                                                                             
   ```
   
   
   
   ### Describe the solution you'd like
   
   I would like to change the display of `RepartitionExec` so it always looks like `RepartitionExec` and has `preserve_order=true` to follow the same pattern as the other nodes where the output of the explain plan matches the name of the code that implements it.
   
   So for example, name should always be `RepartitionExec`: https://github.com/apache/arrow-datafusion/blob/91c9d6f847eda0b5b1d01257b5c24459651d3926/datafusion/physical-plan/src/repartition/mod.rs#L373-L379
   
   And instead of 
   ```
    SortPreservingRepartitionExec: partitioning=Hash([time@0, cpu@1], 16), input_partitions=16, sort_exprs=cpu@1 ASC
   ```
   
   It should display like
   
   ```
    RepartitionExec: partitioning=Hash([time@0, cpu@1], 16), input_partitions=16, preserve_order=true, sort_exprs=cpu@1 ASC
   ```
   
   ### Describe alternatives you've considered
   
    I recommend *ONLY* displaying preserve_order when it is true
   
   so when false, display like this:
   
   ```
   RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=16                       
   ```
   
   ### Additional context
   
   I think this is a good first issue as it is well specified and a mechanical change and would teach the person about datafusions tests. Beware, however, that this is likely to require a non trivial number of changes to expected test output.
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "jeffwidman (via GitHub)" <gi...@apache.org>.
jeffwidman commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1858560969

   Should this be closed now that https://github.com/apache/arrow-datafusion/pull/8521 is merged?
   
   Looks like the PR description simply mis-formatted the `Close #123` statement so it didn't auto-close.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1858690585

   Closed by #8521 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806392675

   We are fine with this change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "JacobOgle (via GitHub)" <gi...@apache.org>.
JacobOgle commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806388138

   I wouldn't mind taking a look into this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan closed issue #8129: Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true`
URL: https://github.com/apache/arrow-datafusion/issues/8129


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806337588

   FYI @ozankabak  and @mustafasrepo  and @metesynnada  as this may impact you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #8129:
URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806401554

   Thanks @JacobOgle !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org