You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/31 16:52:56 UTC

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2670: MINOR: Improve `datafusion-proto` API

andygrove opened a new pull request, #2670:
URL: https://github.com/apache/arrow-datafusion/pull/2670

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   N/A
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   New unified API for serializing plans and expressions.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ``` rust
   let serializer = Serializer::new()
     .with_function_registry(...)
     .with_extension_codec(...);
   
   let bytes = serializer.serialize_plan(...);
   
   let bytes = serializer.serialize_expr(...);
   ```
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   API change
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   
   # Does this PR break compatibility with Ballista?
   
   <!--
   The CI checks will attempt to build [arrow-ballista](https://github.com/apache/arrow-ballista) against this PR. If 
   this check fails then it indicates that this PR makes a breaking change to the DataFusion API.
   
   If possible, try to make the change in a way that is not a breaking API change. For example, if code has moved 
    around, try adding `pub use` from the original location to preserve the current API.
   
   If it is not possible to avoid a breaking change (such as when adding enum variants) then follow this process:
   
   - Make a corresponding PR against `arrow-ballista` with the changes required there
   - Update `dev/build-arrow-ballista.sh` to clone the appropriate `arrow-ballista` repo & branch
   - Merge this PR when CI passes
   - Merge the Ballista PR
   - Create a new PR here to reset `dev/build-arrow-ballista.sh` to point to `arrow-ballista` master again
   
   _If you would like to help improve this process, please see https://github.com/apache/arrow-datafusion/issues/2583_
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed pull request #2670: MINOR: Improve `datafusion-proto` API

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #2670: MINOR: Improve `datafusion-proto` API
URL: https://github.com/apache/arrow-datafusion/pull/2670


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2670: MINOR: Improve `datafusion-proto` API

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2670:
URL: https://github.com/apache/arrow-datafusion/pull/2670#issuecomment-1143806756

   Thanks for the review @tustvold. Based on feedback from @thinkharderdev in https://github.com/apache/arrow-ballista/pull/47 it seems that it would be preferable to expose this as a trait, based on `AsLogicalPlan` but we also need it to support function registries.
   
   I can take a look at this again at the weekend unless @liukun4515 wants to pick this up as part of https://github.com/apache/arrow-datafusion/issues/2640.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2670: MINOR: Improve `datafusion-proto` API

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2670:
URL: https://github.com/apache/arrow-datafusion/pull/2670#issuecomment-1144071422

   marking as draft as there is future work planned here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2670: MINOR: Improve `datafusion-proto` API

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2670:
URL: https://github.com/apache/arrow-datafusion/pull/2670#discussion_r886658567


##########
datafusion/proto/src/lib.rs:
##########
@@ -33,6 +42,122 @@ pub mod to_proto;
 #[cfg(doctest)]
 doc_comment::doctest!("../README.md", readme_example_test);
 
+/// Serialization for `LogicalPlan` and `Expr`
+#[derive(Default)]
+pub struct Serializer<'a> {
+    /// Optional function registry
+    function_registry: Option<&'a dyn FunctionRegistry>,
+    /// Optional extension codec
+    extension_codec: Option<&'a dyn LogicalExtensionCodec>,
+}
+
+impl<'a> Serializer<'a> {
+    /// Create a Serializer
+    pub fn new() -> Self {
+        Self {
+            function_registry: None,
+            extension_codec: None,
+        }
+    }
+
+    /// Create a new Serializer with the provided function registry
+    pub fn with_function_registry(
+        self,
+        function_registry: &'a dyn FunctionRegistry,
+    ) -> Self {
+        Self {
+            function_registry: Some(function_registry),
+            extension_codec: self.extension_codec,
+        }
+    }
+
+    /// Create a new Serializer with the provided function registry
+    pub fn with_extension_codec(
+        self,
+        extension_codec: &'a dyn LogicalExtensionCodec,
+    ) -> Self {
+        Self {
+            function_registry: self.function_registry,
+            extension_codec: Some(extension_codec),
+        }
+    }
+
+    /// Serialize a logical expression
+    pub fn serialize_expr(&self, expr: &Expr) -> Result<Vec<u8>> {
+        expr.to_bytes().map(|b| b.to_vec())
+    }
+
+    /// Deserialize a logical expression
+    pub fn deserialize_expr(&self, bytes: &[u8]) -> Result<Expr> {
+        //TODO note that we will need to also pass the extension_codec along once we
+        // add support for subquery expressions in https://github.com/apache/arrow-datafusion/issues/2640
+        match self.function_registry {
+            Some(r) => Serializeable::from_bytes_with_registry(bytes, r),
+            _ => Serializeable::from_bytes(bytes),
+        }
+    }
+
+    /// Serialize a logical plan
+    pub fn serialize_plan(&self, plan: &LogicalPlan) -> Result<Vec<u8>> {
+        //TODO note that we will need to also pass the function_registry along once we
+        // add support for subquery expressions in https://github.com/apache/arrow-datafusion/issues/2640
+        let protobuf = match self.extension_codec {
+            Some(codec) => protobuf::LogicalPlanNode::try_from_logical_plan(plan, codec)?,

Review Comment:
   Is your plan to move this logic onto Serializer or to pass function_registry down?



##########
datafusion/proto/src/logical_plan.rs:
##########
@@ -69,7 +69,10 @@ pub(crate) fn proto_error<S: Into<String>>(message: S) -> DataFusionError {
     DataFusionError::Internal(message.into())
 }
 
-pub trait AsLogicalPlan: Debug + Send + Sync + Clone {
+// this trait was inherited from Ballista and was intended to support custom logical plans
+// but that does not make sense in the context of this crate so would be good to remove
+// this at some point

Review Comment:
   ```suggestion
   /// This trait was inherited from Ballista and was intended to support custom logical plans
   /// but that does not make sense in the context of this crate so would be good to remove
   /// this at some point
   ```
   
   Perhaps a ticket also?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2670: MINOR: Improve `datafusion-proto` API

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2670:
URL: https://github.com/apache/arrow-datafusion/pull/2670#issuecomment-1146467105

   I am closing this since we have a working solution over in Ballista now (https://github.com/apache/arrow-ballista/pull/57) and I will let someone else pick up this task of improving the API here. I can try again later if nobody does pick this up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org