You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/13 03:11:40 UTC

[GitHub] [arrow-datafusion] tustvold opened a new pull request, #2892: Adds optional serde support to datafusion-proto

tustvold opened a new pull request, #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892

   # Which issue does this PR close?
   
   Closes #2889
   
    # Rationale for this change
   
   This provides a way to serialize the datafusion-proto messages to any serde serializer, e.g. JSON, based on the protobuf [JSON specification](https://developers.google.com/protocol-buffers/docs/proto3#json). 
   
   # What changes are included in this PR?
   
   Uses pbjson to auto-generate the serde implementations for the protobuf messages. Full disclosure: I am the primary author and maintainer of this crate.
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold merged pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922394203


##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }
 prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
 
 [dev-dependencies]
 doc-comment = "0.3"
 tokio = "1.18"
 
 [build-dependencies]
 tonic-build = { version = "0.7" }
+pbjson-build = { version = "0.3", optional = true }

Review Comment:
   ```suggestion
   tonic-build = { version = "0.7" }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
   
   If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
   
   It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried: 
   
   ```
           let protobuf =
               protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
           let string = serde_json::to_string(&protobuf).unwrap();
   ```
           
           I get 
           
   ```
           139  |         let string = serde_json::to_string(&protobuf).unwrap();
        |                      --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
        |                      |
        |                      required by a bound introduced by this call
   ```
   
   Sorry, I'm a bit lost. Any help is appreciated :)



##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
   
   If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
   
   It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried: 
   
   ```
           let protobuf =
               protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
           let string = serde_json::to_string(&protobuf).unwrap();
   ```
           
   I get 
           
   ```
           139  |         let string = serde_json::to_string(&protobuf).unwrap();
        |                      --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
        |                      |
        |                      required by a bound introduced by this call
   ```
   
   Sorry, I'm a bit lost. Any help is appreciated :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922848101


##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }

Review Comment:
   ```suggestion
   pbjson = { version = "0.3", optional = true }
   pbjson-types = { version = "0.3", optional = true }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#issuecomment-1186585975

   Benchmark runs are scheduled for baseline = c528986d8c9df1340431644cd8b3f2f9e725abc8 and contender = c67161b97731d71ca545231890bc159fab6439be. c67161b97731d71ca545231890bc159fab6439be is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9f68f0613f0f4caeb82e5ecc626a25f9...74d466055fba4e81a0f2f69d15bfebe8/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/729d0293f5cb4d0192a59732adc9fe28...514f15cd3f8147cf99413b9ef11be15e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e9ada9518ce34188af2ab1bbe9555010...049999d979634f12aa022c8f789f18bc/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a69c2aa02cd04e5cb70a07e05417b79d...bfea40b07f234c2ea0526efd0d5e64e1/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922848170


##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }

Review Comment:
   ```suggestion
   datafusion-expr = { path = "../expr", version = "10.0.0" }
   pbjson = { version = "0.3", optional = true }
   pbjson-types = { version = "0.3", optional = true }
   ```



##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }
 prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }

Review Comment:
   ```suggestion
   ```



##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }
 prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)

Review Comment:
   Drive by refactor to eliminate an unnecessary macro



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#issuecomment-1182733897

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2892](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bce7edb) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/eed77a286cac497683ca38fd75b6a134455cb1c2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (eed77a2) will **decrease** coverage by `0.00%`.
   > The diff coverage is `92.85%`.
   
   > :exclamation: Current head bce7edb differs from pull request most recent head f13ee06. Consider uploading reports for the commit f13ee06 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2892      +/-   ##
   ==========================================
   - Coverage   85.34%   85.34%   -0.01%     
   ==========================================
     Files         276      276              
     Lines       49290    49302      +12     
   ==========================================
   + Hits        42069    42077       +8     
   - Misses       7221     7225       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/core/tests/user\_defined\_plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3VzZXJfZGVmaW5lZF9wbGFuLnJz) | `87.79% <ø> (ø)` | |
   | [datafusion/optimizer/src/optimizer.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL29wdGltaXplci5ycw==) | `75.00% <0.00%> (-13.24%)` | :arrow_down: |
   | [datafusion/optimizer/src/utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3V0aWxzLnJz) | `88.57% <ø> (ø)` | |
   | [datafusion/core/src/execution/context.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `78.82% <100.00%> (+0.02%)` | :arrow_up: |
   | [...tafusion/optimizer/src/common\_subexpr\_eliminate.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2NvbW1vbl9zdWJleHByX2VsaW1pbmF0ZS5ycw==) | `94.11% <100.00%> (ø)` | |
   | [datafusion/optimizer/src/eliminate\_filter.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2VsaW1pbmF0ZV9maWx0ZXIucnM=) | `100.00% <100.00%> (ø)` | |
   | [datafusion/optimizer/src/eliminate\_limit.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2VsaW1pbmF0ZV9saW1pdC5ycw==) | `100.00% <100.00%> (ø)` | |
   | [datafusion/optimizer/src/filter\_null\_join\_keys.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9udWxsX2pvaW5fa2V5cy5ycw==) | `92.85% <100.00%> (ø)` | |
   | [datafusion/optimizer/src/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9wdXNoX2Rvd24ucnM=) | `98.23% <100.00%> (ø)` | |
   | [datafusion/optimizer/src/limit\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2xpbWl0X3B1c2hfZG93bi5ycw==) | `99.67% <100.00%> (ø)` | |
   | ... and [8 more](https://codecov.io/gh/apache/arrow-datafusion/pull/2892/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eed77a2...f13ee06](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
   
   If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
   
   It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried: 
   
   ```
           let protobuf =
               protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
           let string = serde_json::to_string(&protobuf).unwrap();
           ```
           
           I get 
           
           ```
           139  |         let string = serde_json::to_string(&protobuf).unwrap();
        |                      --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
        |                      |
        |                      required by a bound introduced by this call
        ```
        Sorry, I'm a bit lost. Any help is appreciated :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922394058


##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }
 prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
 
 [dev-dependencies]
 doc-comment = "0.3"
 tokio = "1.18"
 
 [build-dependencies]
 tonic-build = { version = "0.7" }

Review Comment:
   ```suggestion
   pbjson-build = { version = "0.3", optional = true }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)

Review Comment:
   Drive by refactor to eliminate an unnecessary macro, as they complicate development, refactoring, debugging, etc...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920318589


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Oh, I was running my tests without the "serde" feature :facepalm: 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920312561


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Thanks for making this PR! I'm sure it's a step in the correct direction, but after cloning it, I'm having trouble figuring out how to extend it to include `LogicalPlan`s as well. There appears to be no equivalent `parse_plan()`?
   
   If I'm reading this correctly, it looks like the protobuf generator generated an entirely separate copy of all the Expr & LogicalPlan enums, and we created some fairly large functions to convert back and forth between them like `parse_expr`? If so the next step would be to write `parse_plan`?
   
   It does look from other tests like we are able to roundtrip plans to protobuf, so maybe I am missing something. When I tried: 
   
   ```
           let protobuf =
               protobuf::LogicalPlanNode::try_from_logical_plan(&topk_plan, &extension_codec)?;
           let string = serde_json::to_string(&protobuf).unwrap();
   ```
           
   I get 
           
   ```
        |         let string = serde_json::to_string(&protobuf).unwrap();
        |                      --------------------- ^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `LogicalPlanNode`
        |                      |
        |                      required by a bound introduced by this call
   ```
   
   Sorry, I'm a bit lost. Any help is appreciated :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920390154


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Okay, I got all that sorted, and was able to do what I was trying to use this PR for: https://github.com/tustvold/arrow-datafusion/pull/63
   
   I was hoping (naively) to see something like:
   
   ```
   {Projection: { field: "#employee_csv.id",
     Filter: {expr: {left: "#employee_csv.state", op: "IN", right: {
       Subquery: {
         TableScan: {name: "employee_csv", projection=["state"]},
       TableScan: {name: "employee_csv" projection=["id", "state"]
   }}
   ```
   
   Unfortunately (though serializable and deserializable) the result is not as human readable as I hoped, and probably not something I'd want to check into a test assertion: https://gist.github.com/avantgardnerio/35a04950ca768fdfe6579aea08126b74



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920391480


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   The above being said, I can see other uses for JSON serialization that extend beyond my narrow use-case that could make this PR valuable enough to merge on its own merits. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r920412642


##########
datafusion/proto/src/lib.rs:
##########
@@ -75,19 +78,32 @@ mod roundtrip_tests {
     use std::fmt::Formatter;
     use std::sync::Arc;
 
+    #[cfg(feature = "serde")]
+    fn roundtrip_serde_test(proto: &protobuf::LogicalExprNode) {
+        let string = serde_json::to_string(proto).unwrap();
+        let back: protobuf::LogicalExprNode = serde_json::from_str(&string).unwrap();
+        assert_eq!(proto, &back);
+    }
+
+    #[cfg(not(feature = "serde"))]
+    fn roundtrip_serde_test(_proto: &protobuf::LogicalExprNode) {}
+
     // Given a DataFusion logical Expr, convert it to protobuf and back, using debug formatting to test
     // equality.
-    macro_rules! roundtrip_expr_test {
-        ($initial_struct:ident, $ctx:ident) => {
-            let proto: protobuf::LogicalExprNode = (&$initial_struct).try_into().unwrap();
+    fn roundtrip_expr_test<T, E>(initial_struct: T, ctx: SessionContext)
+    where
+        for<'a> &'a T: TryInto<protobuf::LogicalExprNode, Error = E> + Debug,
+        E: Debug,
+    {
+        let proto: protobuf::LogicalExprNode = (&initial_struct).try_into().unwrap();
+        let round_trip: Expr = parse_expr(&proto, &ctx).unwrap();

Review Comment:
   Running that raw JSON through [this formatter](https://github.com/lydell/json-stringify-pretty-compact) produces some encouraging output:
   
   ```
   {
     "projection": {
       "input": {
         "sort": {
           "input": {
             "projection": {
               "input": {
                 "listingScan": {
                   "tableName": "alltypes_plain",
                   "paths": ["file:///home/bgardner/workspace/arrow-datafusion/parquet-testing/data/alltypes_plain.parquet"],
                   "fileExtension": ".parquet",
                   "schema": {
                     "columns": [
                       {"name": "id", "arrowType": {"INT32": {}}, "nullable": true},
                       {"name": "bool_col", "arrowType": {"BOOL": {}}, "nullable": true},
                       {"name": "tinyint_col", "arrowType": {"INT32": {}}, "nullable": true},
                       {"name": "smallint_col", "arrowType": {"INT32": {}}, "nullable": true},
                       {"name": "int_col", "arrowType": {"INT32": {}}, "nullable": true},
                       {"name": "bigint_col", "arrowType": {"INT64": {}}, "nullable": true},
                       {"name": "float_col", "arrowType": {"FLOAT32": {}}, "nullable": true},
                       {"name": "double_col", "arrowType": {"FLOAT64": {}}, "nullable": true},
                       {"name": "date_string_col", "arrowType": {"BINARY": {}}, "nullable": true},
                       {"name": "string_col", "arrowType": {"BINARY": {}}, "nullable": true},
                       {"name": "timestamp_col", "arrowType": {"TIMESTAMP": {"timeUnit": "Nanosecond"}}, "nullable": true}
                     ]
                   },
                   "collectStat": true,
                   "targetPartitions": 16,
                   "parquet": {"enablePruning": true}
                 }
               },
               "expr": [
                 {"column": {"name": "id", "relation": {"relation": "alltypes_plain"}}},
                 {"column": {"name": "int_col", "relation": {"relation": "alltypes_plain"}}},
                 {"column": {"name": "double_col", "relation": {"relation": "alltypes_plain"}}}
               ]
             }
           },
           "expr": [
             {"sort": {"expr": {"column": {"name": "int_col", "relation": {"relation": "alltypes_plain"}}}, "asc": true}},
             {"sort": {"expr": {"column": {"name": "double_col", "relation": {"relation": "alltypes_plain"}}}, "asc": true}}
           ]
         }
       },
       "expr": [{"column": {"name": "id", "relation": {"relation": "alltypes_plain"}}}]
     }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2892:
URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r922690494


##########
datafusion/proto/Cargo.toml:
##########
@@ -33,18 +33,24 @@ name = "datafusion_proto"
 path = "src/lib.rs"
 
 [features]
+default = []
+json = ["pbjson", "pbjson-build", "serde", "serde_json"]
 
 [dependencies]
 arrow = { version = "18.0.0" }
 datafusion = { path = "../core", version = "10.0.0" }
 datafusion-common = { path = "../common", version = "10.0.0" }
 datafusion-expr = { path = "../expr", version = "10.0.0" }
 prost = "0.10"
-
+serde = { version = "1.0", optional = true }
+serde_json = { version = "1.0", optional = true }
+pbjson = { version = "0.3", optional = true }
+pbjson-types = { version = "0.3", optional = true }
 
 [dev-dependencies]
 doc-comment = "0.3"
 tokio = "1.18"
 
 [build-dependencies]
-tonic-build = { version = "0.7" }
+pbjson-build = { version = "0.3", optional = true }
+prost-build = { version = "0.7" }

Review Comment:
   There was actually no reason for this to depend on tonic-build, and not just prost-build. This is likely an orphan from when ballista was extracted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org