You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/21 14:21:59 UTC

[GitHub] [arrow] nealrichardson opened a new pull request, #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

nealrichardson opened a new pull request, #13210:
URL: https://github.com/apache/arrow/pull/13210

   * Pushes KVM handling into ExecPlan so that Run() preserves the R metadata we want.
   * Also pushes special handling for a kind of collapsed query from collect() into Build(). 
   * Better encapsulate KVM for the the $metadata and $r_metadata so that as a user/developer, you never have to touch the serialize/deserialize functions, you just have a list to work with. This is a slight API change, most noticeable if you were to `print(tab$metadata)`; better is to `print(str(tab$metdata))`.
   * Factor out a common utility in r/src for taking cpp11::strings (named character vector) and producing arrow::KeyValueMetadata
   
   The upshot of all of this is that we can push the ExecPlan evaluation into `as_record_batch_reader()`, and all that `collect()` does on top is read the RBR into a Table/data.frame. This means that we can plug dplyr queries into anything else that expects a RecordBatchReader, and it will be (to the maximum extent possible, given the limitations of ExecPlan) streaming, not requiring you to `compute()` and materialize things first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1133642701

   https://issues.apache.org/jira/browse/ARROW-16607


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1138775576

   Done! (ARROW-16670).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1139124284

   Benchmark runs are scheduled for baseline = 156dc72c320dbbdec5424f24857e3335fc8c7dee and contender = a6025f15712aa0829aab748a8d3e776f335265cc. a6025f15712aa0829aab748a8d3e776f335265cc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c7cb9de9f669474182efd53f4e169f16...77088d8d74c147318c71bcc803907add/)
   [Finished :arrow_down:0.5% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/cf45d9d68d8a44a4859910568edd9e6f...a385114df0ca4b929c84db752d641599/)
   [Finished :arrow_down:0.36% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d53ee0ae65a74c46a974aa583ef044df...14e1d34875174ec387d2eee783e7c719/)
   [Finished :arrow_down:0.2% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/b8f0abb979b54a00a5c52f3a29e50f02...97ee05086bd34ea38477a1b99ef092f0/)
   Buildkite builds:
   [Finished] [`a6025f15` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/832)
   [Finished] [`a6025f15` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/831)
   [Finished] [`a6025f15` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/821)
   [Finished] [`a6025f15` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/835)
   [Finished] [`156dc72c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/831)
   [Finished] [`156dc72c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/830)
   [Finished] [`156dc72c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/820)
   [Finished] [`156dc72c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/834)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson closed pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling
URL: https://github.com/apache/arrow/pull/13210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1135816283

   Revision: bace9494a2454e417d935babfb162fdd0ed7c0cb
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-f099564b37](https://github.com/ursacomputing/crossbow/branches/all?query=actions-f099564b37)
   
   |Task|Status|
   |----|------|
   |test-r-arrow-backwards-compatibility|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f099564b37-github-test-r-arrow-backwards-compatibility)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-f099564b37-github-test-r-arrow-backwards-compatibility)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1135813804

   @github-actions crossbow submit test-r-arrow-backwards-compatibility


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1138531323

   > are we sure that preserving them is worth it?
   
   The short answer for why we do this is that people expect to be able to save a data.frame to a Parquet file and get the same thing back when they load it. Pandas took a similar approach (attaching metadata to the schema).
   
   The probably longer answer is that I'm not aware of `vctrs_extension_type` and what it does or does not do. Maybe some of this can be dropped (though for backwards compatibility, not entirely dropped yet). That said, I suspect that doesn't cover everything because I get failing tests without this code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13210:
URL: https://github.com/apache/arrow/pull/13210#issuecomment-1133642708

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org