You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/25 21:55:10 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2341: Add public Serialization/Deserialization API to/from bytes

alamb opened a new pull request, #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341

   WIP PR sketching out Serialization API
   
   # Which issue does this PR close?
   
   
   Closes https://github.com/apache/arrow-datafusion/issues/2340
   
    # Rationale for this change
   We would like to serialize `Expr`s in our own messages without worrying about how they are encoded or making `prost` generated code work nicely across crates
   
   Making it easier to transmit Exprs across the network / serialize to disk is also something that would be interesting to other people trying to build systems with DataFusion.
   
   # What changes are included in this PR?
   1. Proposed API for serialization
   2. Tests
   
   # Are there any user-facing changes?
   Yes, new serialization API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] gaojun2048 commented on pull request #2341: Add public Serialization/Deserialization API for `Expr` to/from bytes

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#issuecomment-1110388086

   Dan Harris ***@***.***>于2022年4月27日 周三05:46写道:
   
   > I wonder if @thinkharderdev <https://github.com/thinkharderdev> or
   > @gaojun2048 <https://github.com/gaojun2048> has any thoughts on this
   > approach?
   >
   > Makes sense to me. It would be great to have a common serialization API
   > between DataFusion and Ballista.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow-datafusion/pull/2341#issuecomment-1110280327>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AHVTXMWDEU47G2K4YWA3SLTVHBP2XANCNFSM5UJ4OV3Q>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   That's great
   -- 
   DolphinScheduler(Incubator)  PPMC
   Jun Gao 高俊
   ***@***.***
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2341: Add public Serialization/Deserialization API for `Expr` to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2341: Add public Serialization/Deserialization API for `Expr` to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#issuecomment-1110244045

   I wonder if @thinkharderdev  or @gaojun2048  has any thoughts on this approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2341: Add public Serialization/Deserialization API to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#discussion_r859084289


##########
datafusion/core/src/prelude.rs:
##########
@@ -36,5 +36,5 @@ pub use crate::logical_plan::{
     in_list, initcap, left, length, lit, lower, lpad, ltrim, max, md5, min, now,
     octet_length, random, regexp_match, regexp_replace, repeat, replace, reverse, right,
     rpad, rtrim, sha224, sha256, sha384, sha512, split_part, starts_with, strpos, substr,
-    sum, to_hex, translate, trim, upper, Column, JoinType, Partitioning,
+    sum, to_hex, translate, trim, upper, Column, Expr, JoinType, Partitioning,

Review Comment:
   The lack of `Expr` in the prelude has always surprised me. I will make a separate PR for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2341: Add public Serialization/Deserialization API for `Expr` to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#issuecomment-1110841784

   Thanks for the comments! I am happy to iterate on this API / move it around if there are more comments or needs evolve over time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2341: Add public Serialization/Deserialization API to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#discussion_r859090519


##########
datafusion/core/src/prelude.rs:
##########
@@ -36,5 +36,5 @@ pub use crate::logical_plan::{
     in_list, initcap, left, length, lit, lower, lpad, ltrim, max, md5, min, now,
     octet_length, random, regexp_match, regexp_replace, repeat, replace, reverse, right,
     rpad, rtrim, sha224, sha256, sha384, sha512, split_part, starts_with, strpos, substr,
-    sum, to_hex, translate, trim, upper, Column, JoinType, Partitioning,
+    sum, to_hex, translate, trim, upper, Column, Expr, JoinType, Partitioning,

Review Comment:
   https://github.com/apache/arrow-datafusion/issues/2347



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2341: Add public Serialization/Deserialization API to/from bytes

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#discussion_r859091593


##########
datafusion/proto/src/serde.rs:
##########
@@ -0,0 +1,102 @@
+use std::{collections::HashSet, sync::Arc};
+
+use crate::{from_proto::parse_expr, protobuf};
+use datafusion::{
+    common::{DataFusionError, Result},
+    logical_expr::{AggregateUDF, ScalarUDF},
+    logical_plan::{Expr, FunctionRegistry},
+};
+use prost::{bytes::BytesMut, Message};
+
+/// Encodes an [`Expr`] into a stream of bytes. See
+/// [`deserialize_expr`] to convert a stream of bytes back to an Expr
+///
+/// Open Questions:
+/// Should this be its own crate / API (aka datafusion-serde?) that can be implemented using proto?
+///
+///
+/// Example:
+///
+/// ```
+/// use datafusion::prelude::*;
+/// use datafusion_proto::serde::Serializeable;
+///
+/// // Create a new `Expr` a < 32
+/// let expr = col("a").lt(lit(5i32));
+///
+/// // Convert it to an opaque form
+/// let bytes = expr.serialize().unwrap();
+///
+/// // Decode bytes from somewhere (over network, etc.
+/// let decoded_expr = Expr::deserialize(&bytes).unwrap();
+/// assert_eq!(expr, decoded_expr);
+/// ```
+pub trait Serializeable: Sized {

Review Comment:
   Question for reviewers:
   1. Should this be its own crate / API (aka datafusion-serde?) that can be implemented using proto?
   
   Or perhaps we can make that change in a subsequent PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2341: Add public Serialization/Deserialization API for `Expr` to/from bytes

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on PR #2341:
URL: https://github.com/apache/arrow-datafusion/pull/2341#issuecomment-1110280327

   > I wonder if @thinkharderdev or @gaojun2048 has any thoughts on this approach?
   
   Makes sense to me. It would be great to have a common serialization API between DataFusion and Ballista. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org