You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/13 03:11:40 UTC
[GitHub] [arrow-datafusion] tustvold opened a new pull request, #2892: Adds optional serde support to datafusion-proto
tustvold opened a new pull request, #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892
# Which issue does this PR close?
Closes #2889
# Rationale for this change
This provides a way to serialize the datafusion-proto messages to any serde serializer, e.g. JSON, based on the protobuf [JSON specification](https://developers.google.com/protocol-buffers/docs/proto3#json).
# What changes are included in this PR?
Uses pbjson to auto-generate the serde implementations for the protobuf messages. Full disclosure: I am the primary author and maintainer of this crate.
# Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] tustvold merged pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922394203
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
[dev-dependencies]
doc-comment = "0.3"
tokio = "1.18"
[build-dependencies]
tonic-build = { version = "0.7" }
+pbjson-build = { version = "0.3", optional = true }
Review Comment:
```suggestion
tonic-build = { version = "0.7" }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried:
```
let protobuf =
protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
let string = serde_json::to_string(&protobuf).unwrap();
```
I get
```
139 | let string = serde_json::to_string(&protobuf).unwrap();
| --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
| |
| required by a bound introduced by this call
```
Sorry, I'm a bit lost. Any help is appreciated :)
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried:
```
let protobuf =
protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
let string = serde_json::to_string(&protobuf).unwrap();
```
I get
```
139 | let string = serde_json::to_string(&protobuf).unwrap();
| --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
| |
| required by a bound introduced by this call
```
Sorry, I'm a bit lost. Any help is appreciated :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922848101
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
Review Comment:
```suggestion
pbjson = { version = "0.3", optional = true }
pbjson-types = { version = "0.3", optional = true }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] ursabot commented on pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#issuecomment-1186585975
Benchmark runs are scheduled for baseline = c528986d8c9df1340431644cd8b3f2f9e725abc8 and contender = c67161b97731d71ca545231890bc159fab6439be. c67161b97731d71ca545231890bc159fab6439be is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9f68f0613f0f4caeb82e5ecc626a25f9...74d466055fba4e81a0f2f69d15bfebe8/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/729d0293f5cb4d0192a59732adc9fe28...514f15cd3f8147cf99413b9ef11be15e/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e9ada9518ce34188af2ab1bbe9555010...049999d979634f12aa022c8f789f18bc/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a69c2aa02cd04e5cb70a07e05417b79d...bfea40b07f234c2ea0526efd0d5e64e1/)
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922848170
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
Review Comment:
```suggestion
datafusion-expr = { path = "../expr", version = "10.0.0" }
pbjson = { version = "0.3", optional = true }
pbjson-types = { version = "0.3", optional = true }
```
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
Review Comment:
```suggestion
```
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
Review Comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
Review Comment:
Drive by refactor to eliminate an unnecessary macro
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#issuecomment-1182733897
# [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#2892](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bce7edb) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/eed77a286cac497683ca38fd75b6a134455cb1c2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (eed77a2) will **decrease** coverage by `0.00%`.
> The diff coverage is `92.85%`.
> :exclamation: Current head bce7edb differs from pull request most recent head f13ee06. Consider uploading reports for the commit f13ee06 to get more accurate results
```diff
@@ Coverage Diff @@
## master #2892 +/- ##
==========================================
- Coverage 85.34% 85.34% -0.01%
==========================================
Files 276 276
Lines 49290 49302 +12
==========================================
+ Hits 42069 42077 +8
- Misses 7221 7225 +4
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [datafusion/core/tests/user\_defined\_plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3VzZXJfZGVmaW5lZF9wbGFuLnJz) | `87.79% <ø> (ø)` | |
| [datafusion/optimizer/src/optimizer.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL29wdGltaXplci5ycw==) | `75.00% <0.00%> (-13.24%)` | :arrow_down: |
| [datafusion/optimizer/src/utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3V0aWxzLnJz) | `88.57% <ø> (ø)` | |
| [datafusion/core/src/execution/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `78.82% <100.00%> (+0.02%)` | :arrow_up: |
| [...tafusion/optimizer/src/common\_subexpr\_eliminate.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2NvbW1vbl9zdWJleHByX2VsaW1pbmF0ZS5ycw==) | `94.11% <100.00%> (ø)` | |
| [datafusion/optimizer/src/eliminate\_filter.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2VsaW1pbmF0ZV9maWx0ZXIucnM=) | `100.00% <100.00%> (ø)` | |
| [datafusion/optimizer/src/eliminate\_limit.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2VsaW1pbmF0ZV9saW1pdC5ycw==) | `100.00% <100.00%> (ø)` | |
| [datafusion/optimizer/src/filter\_null\_join\_keys.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9udWxsX2pvaW5fa2V5cy5ycw==) | `92.85% <100.00%> (ø)` | |
| [datafusion/optimizer/src/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9wdXNoX2Rvd24ucnM=) | `98.23% <100.00%> (ø)` | |
| [datafusion/optimizer/src/limit\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2xpbWl0X3B1c2hfZG93bi5ycw==) | `99.67% <100.00%> (ø)` | |
| ... and [8 more](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eed77a2...f13ee06](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried:
```
let protobuf =
protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
let string = serde_json::to_string(&protobuf).unwrap();
```
I get
```
139 | let string = serde_json::to_string(&protobuf).unwrap();
| --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
| |
| required by a bound introduced by this call
```
Sorry, I'm a bit lost. Any help is appreciated :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922394058
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
[dev-dependencies]
doc-comment = "0.3"
tokio = "1.18"
[build-dependencies]
tonic-build = { version = "0.7" }
Review Comment:
```suggestion
pbjson-build = { version = "0.3", optional = true }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
Review Comment:
Drive by refactor to eliminate an unnecessary macro, as they complicate development, refactoring, debugging, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920318589
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Oh, I was running my tests without the "serde" feature :facepalm:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried:
```
let protobuf =
protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
let string = serde_json::to_string(&protobuf).unwrap();
```
I get
```
| let string = serde_json::to_string(&protobuf).unwrap();
| --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
| |
| required by a bound introduced by this call
```
Sorry, I'm a bit lost. Any help is appreciated :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920390154
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Okay, I got all that sorted, and was able to do what I was trying to use this PR for: https://github.com/tustvold/arrow-datafusion/pull/63
I was hoping (naively) to see something like:
```
{Projection: { field: "#employee_csv.id",
Filter: {expr: {left: "#employee_csv.state", op: "IN", right: {
Subquery: {
TableScan: {name: "employee_csv", projection=["state"]},
TableScan: {name: "employee_csv" projection=["id", "state"]
}}
```
Unfortunately (though serializable and deserializable) the result is not as human readable as I hoped, and probably not something I'd want to check into a test assertion: https://gist.github.com/avantgardnerio/35a04950ca768fdfe6579aea08126b74
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920391480
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
The above being said, I can see other uses for JSON serialization that extend beyond my narrow use-case that could make this PR valuable enough to merge on its own merits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920412642
##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
use std::fmt::Formatter;
use std::sync::Arc;
+ #[cfg(feature = "serde")]
+ fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+ let string = serde_json::to_string(proto).unwrap();
+ let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+ assert_eq!(proto, &back);
+ }
+
+ #[cfg(not(feature = "serde"))]
+ fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
// Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
// equality.
- macro_rules! roundtrip_expr_test {
- ($initial_struct:ident, $ctx:ident) => {
- let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+ fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+ where
+ for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+ E: Debug,
+ {
+ let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+ let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();
Review Comment:
Running that raw JSON through [this formatter](https://github.com/lydell/json-stringify-pretty-compact) produces some encouraging output:
```
{
"projection": {
"input": {
"sort": {
"input": {
"projection": {
"input": {
"listingScan": {
"tableName": "alltypes_plain",
"paths": ["file:///home/bgardner/workspace/arrow-datafusion/parquet-testing/data/alltypes_plain.parquet"],
"fileExtension": ".parquet",
"schema": {
"columns": [
{"name": "id", "arrowType": {"INT32": {}}, "nullable": true},
{"name": "bool_col", "arrowType": {"BOOL": {}}, "nullable": true},
{"name": "tinyint_col", "arrowType": {"INT32": {}}, "nullable": true},
{"name": "smallint_col", "arrowType": {"INT32": {}}, "nullable": true},
{"name": "int_col", "arrowType": {"INT32": {}}, "nullable": true},
{"name": "bigint_col", "arrowType": {"INT64": {}}, "nullable": true},
{"name": "float_col", "arrowType": {"FLOAT32": {}}, "nullable": true},
{"name": "double_col", "arrowType": {"FLOAT64": {}}, "nullable": true},
{"name": "date_string_col", "arrowType": {"BINARY": {}}, "nullable": true},
{"name": "string_col", "arrowType": {"BINARY": {}}, "nullable": true},
{"name": "timestamp_col", "arrowType": {"TIMESTAMP": {"timeUnit": "Nanosecond"}}, "nullable": true}
]
},
"collectStat": true,
"targetPartitions": 16,
"parquet": {"enablePruning": true}
}
},
"expr": [
{"column": {"name": "id", "relation": {"relation": "alltypes_plain"}}},
{"column": {"name": "int_col", "relation": {"relation": "alltypes_plain"}}},
{"column": {"name": "double_col", "relation": {"relation": "alltypes_plain"}}}
]
}
},
"expr": [
{"sort": {"expr": {"column": {"name": "int_col", "relation": {"relation": "alltypes_plain"}}}, "asc": true}},
{"sort": {"expr": {"column": {"name": "double_col", "relation": {"relation": "alltypes_plain"}}}, "asc": true}}
]
}
},
"expr": [{"column": {"name": "id", "relation": {"relation": "alltypes_plain"}}}]
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto
Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922690494
##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
path = "src/lib.rs"
[features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
[dependencies]
arrow = { version = "18.0.0" }
datafusion = { path = "../core", version = "10.0.0" }
datafusion-common = { path = "../common", version = "10.0.0" }
datafusion-expr = { path = "../expr", version = "10.0.0" }
prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
[dev-dependencies]
doc-comment = "0.3"
tokio = "1.18"
[build-dependencies]
-tonic-build = { version = "0.7" }
+pbjson-build = { version = "0.3", optional = true }
+prost-build = { version = "0.7" }
Review Comment:
There was actually no reason for this to depend on tonic-build, and not just prost-build. This is likely an orphan from when ballista was extracted
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org