You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/18 13:27:52 UTC

[GitHub] [arrow] jonkeane commented on pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

jonkeane commented on PR #12316:
URL: https://github.com/apache/arrow/pull/12316#issuecomment-1101409026

   I took a read through and this looks good this far.
   
   > I pushed a fix that will grab the KVM from the source data object in the query and use that, and that fixes the failing tests, but I'm not sure that's totally correct. If you have a join or aggregation, it won't make sense; we have some logic in do_exec_plan() to handle some of this, should we factor that out and apply it in write_dataset()? Also more than happy to defer handling this since (afaict) this PR is consistent with the status quo, and we don't have any assertions about the behavior (that I can find) of general metadata on dataset write, nor any handling for amalgamating metadata when joining, etc. Your call @jonkeane.
   
   If it's not too much, I would say we should factor out what's in `do_exec_plan` and at least do that much — we can (and should) defer on what amalgamating when you join (and AFAICT the code in `do_exec_plan` handles the aggregation case). We have had a few issues and twitter threads about metadata + datasets (at least in the cases where it failed with row-level metadata!) but folks expected that would just work. And of course, we should also have tests for that to assert that's what we intend — though we can do that as a follow on as well.
   
   The only other question I have is: https://github.com/apache/arrow/pull/12316/files#r850699081 did you add tests for these? I didn't see them, but maybe I'm missing something!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org