You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/28 14:47:32 UTC

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2639: WIP: Implement LogicalPlan serde in `datafusion-proto`

andygrove opened a new pull request, #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes # - TBD
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   I want to be able to serialize logical plans.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   - [x] Add LogicalPlan types to datafusion.proto (copied from Ballista)
   - [x] Add LogicalPlan serde code (copied from Ballista)
   - [ ] Expose via API
   - [ ] Write tests
   - [ ] Update documenation
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   TBD
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   
   # Does this PR break compatibility with Ballista?
   
   No
   
   <!--
   The CI checks will attempt to build [arrow-ballista](https://github.com/apache/arrow-ballista) against this PR. If 
   this check fails then it indicates that this PR makes a breaking change to the DataFusion API.
   
   If possible, try to make the change in a way that is not a breaking API change. For example, if code has moved 
    around, try adding `pub use` from the original location to preserve the current API.
   
   If it is not possible to avoid a breaking change (such as when adding enum variants) then follow this process:
   
   - Make a corresponding PR against `arrow-ballista` with the changes required there
   - Update `dev/build-arrow-ballista.sh` to clone the appropriate `arrow-ballista` repo & branch
   - Merge this PR when CI passes
   - Merge the Ballista PR
   - Create a new PR here to reset `dev/build-arrow-ballista.sh` to point to `arrow-ballista` master again
   
   _If you would like to help improve this process, please see https://github.com/apache/arrow-datafusion/issues/2583_
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2639: Implement `LogicalPlan` serde in `datafusion-proto`

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639#issuecomment-1140314567

   @thinkharderdev @yahoNanJing fyi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2639: Implement `LogicalPlan` serde in `datafusion-proto`

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639#discussion_r884259746


##########
datafusion/proto/src/bytes/mod.rs:
##########
@@ -93,6 +101,69 @@ impl Serializeable for Expr {
     }
 }
 
+/// Serialize a LogicalPlan as bytes

Review Comment:
   Yes, I had been thinking about this too and we will need a unified API to implement subquery support (because expressions now reference plans) so I think we can make this API change as part of https://github.com/apache/arrow-datafusion/issues/2640



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2639: Implement `LogicalPlan` serde in `datafusion-proto`

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639#discussion_r884261214


##########
datafusion/proto/src/lib.rs:
##########
@@ -62,6 +84,18 @@ mod roundtrip_tests {
         Box::new(Field::new(name, dt, nullable))
     }
 
+    #[tokio::test]
+    async fn roundtrip_logical_plan() -> Result<(), DataFusionError> {

Review Comment:
   I added a test based on the TopK extension codec from Ballista



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2639: Implement `LogicalPlan` serde in `datafusion-proto`

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639#discussion_r884249643


##########
datafusion/proto/src/lib.rs:
##########
@@ -23,24 +27,42 @@ pub mod protobuf {
 
 pub mod bytes;
 pub mod from_proto;
+pub mod logical_plan;
 pub mod to_proto;
 
+#[cfg(doctest)]
+doc_comment::doctest!("../README.md", readme_example_test);

Review Comment:
   👍 



##########
datafusion/proto/src/bytes/mod.rs:
##########
@@ -93,6 +101,69 @@ impl Serializeable for Expr {
     }
 }
 
+/// Serialize a LogicalPlan as bytes

Review Comment:
   I wonder if it is time to think about a slightly more encapsulated API, something that would allow
   
   ```rust
     let bytes = PlanSerializer::new()
       .with_extension_codec(&extension_codec)
      .serialize()?;
   ```
   
   Rather than having two separate free functions 🤷 
     



##########
datafusion/proto/src/lib.rs:
##########
@@ -62,6 +84,18 @@ mod roundtrip_tests {
         Box::new(Field::new(name, dt, nullable))
     }
 
+    #[tokio::test]
+    async fn roundtrip_logical_plan() -> Result<(), DataFusionError> {

Review Comment:
   It might be good to add a test that uses an extension codec, to prevent future regressions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove merged pull request #2639: Implement `LogicalPlan` serde in `datafusion-proto`

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #2639:
URL: https://github.com/apache/arrow-datafusion/pull/2639


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org