You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/29 18:38:50 UTC

[GitHub] [arrow] jvanstraten opened a new pull request, #13468: ARROW-16816: Upgrade Substrait to v0.3.0

jvanstraten opened a new pull request, #13468:
URL: https://github.com/apache/arrow/pull/13468

   Note: I actually upgraded to v0.6.0; it didn't make much sense to me to not just go to the latest release. I guess I'll downgrade if there was a specific reason for going to exactly v0.3.0 that I'm not aware of.
   
   Stuff that broke:
    - `relations.proto` and `expressions.proto` were merged into `algebra.proto` in https://github.com/substrait-io/substrait/pull/136
    - Breaking change in how file formats are specified in read relations: https://github.com/substrait-io/substrait/pull/169
    - Deprecation in specification of function arguments, switched to the new format (supporting the old one as well would be a bit more work, which I'm not sure is worthwhile at this stage): https://github.com/substrait-io/substrait/pull/161
    - Deprecation of `user_defined_type_reference` in `Type`, replacing it with a message that also supports nullability: https://github.com/substrait-io/substrait/pull/217


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
rtpsw commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r914047237


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   :ate to the party, but in case it helps, I see there is [EnumDescriptor](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor#EnumDescriptor).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13468: ARROW-16816: Upgrade Substrait to v0.3.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1170358013

   https://issues.apache.org/jira/browse/ARROW-16816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912347253


##########
cpp/src/arrow/engine/substrait/serde_test.cc:
##########
@@ -185,7 +185,8 @@ TEST(Substrait, SupportedExtensionTypes) {
     ASSERT_OK_AND_ASSIGN(
         auto buf,
         internal::SubstraitFromJSON(
-            "Type", "{\"user_defined_type_reference\": " + std::to_string(anchor) + "}"));
+            "Type", "{\"user_defined\": { \"type_reference\": " + std::to_string(anchor) +
+                        ", \"nullability\": \"NULLABILITY_NULLABLE\" } }"));

Review Comment:
   I see. Let’s go with what you have now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r913972694


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   Minor nit: If possible, it might be nice to include the string value of what was specified?  E.g. "Only `value` is supported but got "enum".  But I don't remember if it is easy to go from `arg_type_case` to a string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1172874441

   I noticed, but I have seen exactly 0 lines of R code in my life before, nor do I have a toolchain set up for it. If the R bindings are using Substrait already and this isn't some weird CI thing, I'm afraid I'll need considerable help from someone in that subcommunity. Do you know anyone I can ping for that?
   
   How does the Arrow community generally deal with upgrading dependencies past breaking changes that require interdisciplinary work? I don't think these will be the last breaking changes we'll see from Substrait in the foreseeable future, and considering the fact that Arrow is one of the leading projects that's requesting changes to Substrait it seems prudent to upgrade regularly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace merged PR #13468:
URL: https://github.com/apache/arrow/pull/13468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1172802233

   > @vibhatha do you mind taking a look at this PR?
   
   Of course 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912301625


##########
cpp/src/arrow/engine/substrait/type_internal.cc:
##########
@@ -196,10 +196,11 @@ Result<std::pair<std::shared_ptr<DataType>, bool>> FromProto(
           field("value", std::move(value_nullable.first), value_nullable.second));
     }
 
-    case ::substrait::Type::kUserDefinedTypeReference: {
-      uint32_t anchor = type.user_defined_type_reference();
+    case ::substrait::Type::kUserDefined: {
+      const auto& user_defined = type.user_defined();
+      uint32_t anchor = user_defined.type_reference();
       ARROW_ASSIGN_OR_RAISE(auto type_record, ext_set.DecodeType(anchor));
-      return std::make_pair(std::move(type_record.type), true);
+      return std::make_pair(std::move(type_record.type), IsNullable(user_defined));

Review Comment:
   @jvanstraten 
   Before it was set to true always. Not sure why. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #13468: ARROW-16816: Upgrade Substrait to v0.3.0

Posted by GitBox <gi...@apache.org>.
kou commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1170576945

   Could you update `v0.3.0` in pull request and JIRA issue to `v0.6.0`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r914056004


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   Ah, good to know that it does exist. It still wouldn't be very helpful to users if a new option is added since the last time Arrow bumped Substrait, but I guess that's shifting the goalposts a little.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1175242854

   @sanjibansg This PR will likely have impacts on function mapping work.
   
   @rtpsw You may be interested in these changes as well.  I'm not sure how you've been creating plans so far but this change from the old style of specifying arguments to the new style is likely to have impacts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912308625


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -106,17 +106,16 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
           path = item.uri_path_glob();
         }
 
-        if (item.format() ==
-            substrait::ReadRel::LocalFiles::FileOrFiles::FILE_FORMAT_PARQUET) {
-          format = std::make_shared<dataset::ParquetFileFormat>();
-        } else if (util::string_view{path}.ends_with(".arrow")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else if (util::string_view{path}.ends_with(".feather")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else {
-          return Status::NotImplemented(
-              "substrait::ReadRel::LocalFiles::FileOrFiles::format "
-              "other than FILE_FORMAT_PARQUET");
+        switch (item.file_format_case()) {
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kParquet:
+            format = std::make_shared<dataset::ParquetFileFormat>();
+            break;
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kArrow:
+            format = std::make_shared<dataset::IpcFileFormat>();
+            break;
+          default:
+            return Status::NotImplemented(
+                "unknown substrait::ReadRel::LocalFiles::FileOrFiles::file_format");

Review Comment:
   I think we are ok.  The feather format (v2) and the arrow IPC format are the same thing.  Sometimes people use the extension `.arrow` and sometimes they use the extension `.feather`.  However, in both cases they should be specifying `kArrow` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1172819882

   @jvanstraten some `R` test cases are failing. May be it need some work there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912301240


##########
cpp/src/arrow/engine/substrait/serde_test.cc:
##########
@@ -185,7 +185,8 @@ TEST(Substrait, SupportedExtensionTypes) {
     ASSERT_OK_AND_ASSIGN(
         auto buf,
         internal::SubstraitFromJSON(
-            "Type", "{\"user_defined_type_reference\": " + std::to_string(anchor) + "}"));
+            "Type", "{\"user_defined\": { \"type_reference\": " + std::to_string(anchor) +
+                        ", \"nullability\": \"NULLABILITY_NULLABLE\" } }"));

Review Comment:
   `nullability` did we missed including this last time? Or are we planning to expose this too? My knowledge is not quite adequate here, appreciate some comments. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912347331


##########
cpp/src/arrow/engine/substrait/type_internal.cc:
##########
@@ -196,10 +196,11 @@ Result<std::pair<std::shared_ptr<DataType>, bool>> FromProto(
           field("value", std::move(value_nullable.first), value_nullable.second));
     }
 
-    case ::substrait::Type::kUserDefinedTypeReference: {
-      uint32_t anchor = type.user_defined_type_reference();
+    case ::substrait::Type::kUserDefined: {
+      const auto& user_defined = type.user_defined();
+      uint32_t anchor = user_defined.type_reference();
       ARROW_ASSIGN_OR_RAISE(auto type_record, ext_set.DecodeType(anchor));
-      return std::make_pair(std::move(type_record.type), true);
+      return std::make_pair(std::move(type_record.type), IsNullable(user_defined));

Review Comment:
   👍



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
rtpsw commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1175336865

   Great! Thanks for letting me know, @westonpace.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1174903417

   I spent the time to get R up and running on my machine and (depending on what CI will say) have fixed it already. The problem was just that the plan in the one R test case had to be upgraded to use the new file format specification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
rtpsw commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r914060552


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   The Arrow code could include in the message the value converted to a string if it is known given the Substrait version the code was built with, and otherwise include the number of the value and explain it is unknown and coming from a different version of Substrait. All this enum-handling is for a separate issue, of course.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13468: ARROW-16816: Upgrade Substrait to v0.3.0

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1170358043

   :warning: Ticket **has no components in JIRA**, make sure you assign one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912310969


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -106,17 +106,16 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
           path = item.uri_path_glob();
         }
 
-        if (item.format() ==
-            substrait::ReadRel::LocalFiles::FileOrFiles::FILE_FORMAT_PARQUET) {
-          format = std::make_shared<dataset::ParquetFileFormat>();
-        } else if (util::string_view{path}.ends_with(".arrow")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else if (util::string_view{path}.ends_with(".feather")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else {
-          return Status::NotImplemented(
-              "substrait::ReadRel::LocalFiles::FileOrFiles::format "
-              "other than FILE_FORMAT_PARQUET");
+        switch (item.file_format_case()) {
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kParquet:
+            format = std::make_shared<dataset::ParquetFileFormat>();
+            break;
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kArrow:
+            format = std::make_shared<dataset::IpcFileFormat>();
+            break;
+          default:
+            return Status::NotImplemented(
+                "unknown substrait::ReadRel::LocalFiles::FileOrFiles::file_format");

Review Comment:
   That make sense. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912300942


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -106,17 +106,16 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
           path = item.uri_path_glob();
         }
 
-        if (item.format() ==
-            substrait::ReadRel::LocalFiles::FileOrFiles::FILE_FORMAT_PARQUET) {
-          format = std::make_shared<dataset::ParquetFileFormat>();
-        } else if (util::string_view{path}.ends_with(".arrow")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else if (util::string_view{path}.ends_with(".feather")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else {
-          return Status::NotImplemented(
-              "substrait::ReadRel::LocalFiles::FileOrFiles::format "
-              "other than FILE_FORMAT_PARQUET");
+        switch (item.file_format_case()) {
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kParquet:
+            format = std::make_shared<dataset::ParquetFileFormat>();
+            break;
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kArrow:
+            format = std::make_shared<dataset::IpcFileFormat>();
+            break;
+          default:
+            return Status::NotImplemented(
+                "unknown substrait::ReadRel::LocalFiles::FileOrFiles::file_format");

Review Comment:
   What about `feather` format?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1172874760

   @jvanstraten I will take a look at the R component and see what’s exactly going on. Later on I will tag for support. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1172805189

   @jvanstraten thank you for working on this one. I added a few comments.
   
   @westonpace Looks good to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912308625


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -106,17 +106,16 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
           path = item.uri_path_glob();
         }
 
-        if (item.format() ==
-            substrait::ReadRel::LocalFiles::FileOrFiles::FILE_FORMAT_PARQUET) {
-          format = std::make_shared<dataset::ParquetFileFormat>();
-        } else if (util::string_view{path}.ends_with(".arrow")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else if (util::string_view{path}.ends_with(".feather")) {
-          format = std::make_shared<dataset::IpcFileFormat>();
-        } else {
-          return Status::NotImplemented(
-              "substrait::ReadRel::LocalFiles::FileOrFiles::format "
-              "other than FILE_FORMAT_PARQUET");
+        switch (item.file_format_case()) {
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kParquet:
+            format = std::make_shared<dataset::ParquetFileFormat>();
+            break;
+          case substrait::ReadRel_LocalFiles_FileOrFiles::kArrow:
+            format = std::make_shared<dataset::IpcFileFormat>();
+            break;
+          default:
+            return Status::NotImplemented(
+                "unknown substrait::ReadRel::LocalFiles::FileOrFiles::file_format");

Review Comment:
   I think we are ok.  The feather format (v2) and the arrow IPC format are the same thing.  Sometimes people use the extension .arrow and sometimes they use the extension .feather.  However, in both cases they should be specifying `kArrow` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912346149


##########
cpp/src/arrow/engine/substrait/serde_test.cc:
##########
@@ -185,7 +185,8 @@ TEST(Substrait, SupportedExtensionTypes) {
     ASSERT_OK_AND_ASSIGN(
         auto buf,
         internal::SubstraitFromJSON(
-            "Type", "{\"user_defined_type_reference\": " + std::to_string(anchor) + "}"));
+            "Type", "{\"user_defined\": { \"type_reference\": " + std::to_string(anchor) +
+                        ", \"nullability\": \"NULLABILITY_NULLABLE\" } }"));

Review Comment:
   No, Substrait itself didn't consider user-defined types to conceptually have nullability, for no particular reason I can think of. See https://github.com/substrait-io/substrait/pull/217



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r912346263


##########
cpp/src/arrow/engine/substrait/type_internal.cc:
##########
@@ -196,10 +196,11 @@ Result<std::pair<std::shared_ptr<DataType>, bool>> FromProto(
           field("value", std::move(value_nullable.first), value_nullable.second));
     }
 
-    case ::substrait::Type::kUserDefinedTypeReference: {
-      uint32_t anchor = type.user_defined_type_reference();
+    case ::substrait::Type::kUserDefined: {
+      const auto& user_defined = type.user_defined();
+      uint32_t anchor = user_defined.type_reference();
       ARROW_ASSIGN_OR_RAISE(auto type_record, ext_set.DecodeType(anchor));
-      return std::make_pair(std::move(type_record.type), true);
+      return std::make_pair(std::move(type_record.type), IsNullable(user_defined));

Review Comment:
   I think @bkietz just set it to true in the initial PR because nullable types are the default in Arrow and he had nowhere to get the flag from.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #13468: ARROW-16816: Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #13468:
URL: https://github.com/apache/arrow/pull/13468#issuecomment-1171597202

   @vibhatha do you mind taking a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jvanstraten commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
jvanstraten commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r913980402


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   I don't think that's a thing, at least not automatically. I could list the options, but if Substrait adds something later it's not going to upgrade automatically. The same applies to plenty of other of switch statements in the code currently. Also, if a plan with a function is passed to Arrow that uses argument features that Arrow doesn't even understand, the bigger issue will be that Arrow would have no idea what the function means to begin with. So I guess I could add the function name to the message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13468: ARROW-16816: [C++] Upgrade Substrait to v0.6.0

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13468:
URL: https://github.com/apache/arrow/pull/13468#discussion_r913993091


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -160,9 +160,18 @@ Result<compute::Expression> FromProto(const substrait::Expression& expr,
       ARROW_ASSIGN_OR_RAISE(auto decoded_function,
                             ext_set.DecodeFunction(scalar_fn.function_reference()));
 
-      std::vector<compute::Expression> arguments(scalar_fn.args_size());
-      for (int i = 0; i < scalar_fn.args_size(); ++i) {
-        ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(scalar_fn.args(i), ext_set));
+      std::vector<compute::Expression> arguments(scalar_fn.arguments_size());
+      for (int i = 0; i < scalar_fn.arguments_size(); ++i) {
+        const auto& argument = scalar_fn.arguments(i);
+        switch (argument.arg_type_case()) {
+          case substrait::FunctionArgument::kValue: {
+            ARROW_ASSIGN_OR_RAISE(arguments[i], FromProto(argument.value(), ext_set));
+            break;
+          }
+          default:
+            return Status::NotImplemented(
+                "only value arguments are currently supported for functions");

Review Comment:
   If there is no easy enum->string then let's not worry about it.  I think this is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org