You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/13 14:44:36 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #7729: ARROW-9420 [Rust][DataFusion] Added repartion physical plan

jorgecarleitao opened a new pull request #7729:
URL: https://github.com/apache/arrow/pull/7729


   This is written on top of #7687, so we should merge the other first.
   
   This does not include any optimization to actually use this operation. We need to work out in a future PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #7729: ARROW-9420: [Rust][DataFusion] Added repartion physical plan

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #7729:
URL: https://github.com/apache/arrow/pull/7729


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #7729: ARROW-9420: [Rust][DataFusion] Added repartion physical plan

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-666555314


   I don't think we are ready to take repartitioning at this point.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7729: ARROW-9420 [Rust][DataFusion] Added repartion physical plan

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-657604297


   https://issues.apache.org/jira/browse/ARROW-9420


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #7729: ARROW-9420 [Rust][DataFusion] Added repartion physical plan

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-657614516


   One thing that is not clear to me yet is the idiom to handle RecordBatch and partitions. My understanding is that a Partition can be executed in parallel (thread), but a RecordBatch is generally executed on the same thread, i.e. we normally loop through each RecordBatch using the same thread.
   
   Is the goal of RecordBatch to split a partition in smaller chunks of data to avoid too much memory usage?
   
   In this PR, I have not merged all the RecordBatches within a given partition in a single batch, and instead kept them separate. I am not sure if this is the correct approach here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #7729: ARROW-9420 [Rust][DataFusion] Added repartion physical plan

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-657615416


   Another point of contest here is that I have not tested what happens to rows with one key whose value is null.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org