You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/15 12:20:13 UTC

[GitHub] [arrow] nealrichardson commented on pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

nealrichardson commented on PR #12316:
URL: https://github.com/apache/arrow/pull/12316#issuecomment-1100072852

   The last failing test was that row-level metadata isn't included when you collect() a dataset, with a warning: https://github.com/apache/arrow/blob/master/r/tests/testthat/test-metadata.R#L274-L277
   
   What was actually happening is that no KeyValueMetadata was being preserved on write in this PR, we just don't have any tests currently around metadata in write_dataset, only this one about row-level metadata (in this PR, neither row-level nor any other kind of of metadata is present). 
   
   I pushed a fix that will grab the KVM from the source data object in the query and use that, and that fixes the failing tests, but I'm not sure that's totally correct. If you have a join or aggregation, it won't make sense; we have some logic in do_exec_plan() to handle some of this, should we factor that out and apply it in write_dataset()? Also more than happy to defer handling this since (afaict) this PR is consistent with the status quo, and we don't have any assertions about the behavior (that I can find) of general metadata on dataset write, nor any handling for amalgamating metadata when joining, etc. Your call @jonkeane.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org