You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/09 05:09:29 UTC

[GitHub] [arrow-datafusion] houqp opened a new pull request #840: [ballista] support date_part and date_turnc, pass tpch 7

houqp opened a new pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840


    # Rationale for this change
   
   Add date_part and date_trunc ser/de to support tpch 7 in ballista
   
   # What changes are included in this PR?
   
   * Add date_part and date_trunc function expr node ser/de to ballista.
   * Renamed ScalarFunctionNode's expr field to args to be more consistent with the rest of the code base
   
   # Are there any user-facing changes?
   
   no
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #840: [ballista] support date_part and date_turnc, pass tpch 7

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r684913244



##########
File path: datafusion/src/physical_plan/functions.rs
##########
@@ -277,8 +277,8 @@ impl FromStr for BuiltinScalarFunction {
             "concat" => BuiltinScalarFunction::Concat,
             "concat_ws" => BuiltinScalarFunction::ConcatWithSeparator,
             "chr" => BuiltinScalarFunction::Chr,
-            "date_part" => BuiltinScalarFunction::DatePart,
-            "date_trunc" => BuiltinScalarFunction::DateTrunc,
+            "date_part" | "datepart" => BuiltinScalarFunction::DatePart,
+            "date_trunc" | "datetrunc" => BuiltinScalarFunction::DateTrunc,

Review comment:
       Ballista physical plan ser/de uses function.to_string() to serialize the name, which uses the enum name and resulted in function name without `_`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r685234398



##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       I agree that we don't need to worry too much about breaking changes at this point given the early and experimental nature of Ballista.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r684929275



##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       This makes it a breaking change? Not that it matters much now I guess, but would be good to know when it starts mattering.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan merged pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r684969916



##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       Yep, there is another breaking change in ScalarFunctionNode where i renamed `expr` field to `args`. Both of these changes are done purely for improving readability of the code base.
   
   I think we should start to be careful about breaking change when we start to receive interests in using ballista for production PoC. But if any one thinks we shouldn't introduce breaking change for readability even at current stage, I am more than happy to revert these changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r684966869



##########
File path: benchmarks/src/bin/tpch.rs
##########
@@ -1140,6 +1140,7 @@ mod tests {
         test_round_trip!(q3, 3);
         test_round_trip!(q5, 5);
         test_round_trip!(q6, 6);
+        test_round_trip!(q7, 7);

Review comment:
       They are all broken in ballista due to different ser/de errors, my plan is to fix them one at a time so we are on par with datafusion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#issuecomment-895408746


   Thanks @houqp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r685390622



##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       👎 

##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r684930821



##########
File path: benchmarks/src/bin/tpch.rs
##########
@@ -1140,6 +1140,7 @@ mod tests {
         test_round_trip!(q3, 3);
         test_round_trip!(q5, 5);
         test_round_trip!(q6, 6);
+        test_round_trip!(q7, 7);

Review comment:
       What's the status of query 8,9,13,14,19 in Ballista?
   Those are now included in the DataFusion CI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #840:
URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r685093844



##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -144,18 +144,19 @@ enum ScalarFunction {
   TOTIMESTAMP = 24;
   ARRAY = 25;
   NULLIF = 26;
-  DATETRUNC = 27;
-  MD5 = 28;
-  SHA224 = 29;
-  SHA256 = 30;
-  SHA384 = 31;
-  SHA512 = 32;
-  LN = 33;
+  DATEPART = 27;

Review comment:
       I think until we know someone is using the ballista codebase in a way that such a change will affect them, we should not spend extra time striving for backwards compatibility (aka in my opinion this change is fine).
   
   Ballista also looks to me like more of an overall system (aka something people could use directly rather than as a library) which is different than DataFusion. I wonder if internal changes such as these enum values changes are less of an issue with such a system. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org