You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/01 08:54:50 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #14558: ARROW-18205: [C++] Substrait consumer is not converting right side references correctly on joins

westonpace commented on code in PR #14558:
URL: https://github.com/apache/arrow/pull/14558#discussion_r1010218905


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -479,22 +479,38 @@ Result<DeclarationInfo> FromProto(const substrait::Rel& rel, const ExtensionSet&
             callptr->function_name);
       }
 
-      // TODO: ARROW-16624 Add Suffix support for Substrait
-      const auto* left_keys = callptr->arguments[0].field_ref();
-      const auto* right_keys = callptr->arguments[1].field_ref();
-      if (!left_keys || !right_keys) {
-        return Status::Invalid("Left keys for join cannot be null");
-      }
-
       // Create output schema from left, right relations and join keys
       FieldVector combined_fields = left.output_schema->fields();
       const FieldVector& right_fields = right.output_schema->fields();
       combined_fields.insert(combined_fields.end(), right_fields.begin(),
                              right_fields.end());
       std::shared_ptr<Schema> join_schema = schema(std::move(combined_fields));
 
+      // adjust the join_keys according to Substrait definition where
+      // the join fields are defined by considering the `join_schema` which
+      // is the combination of the left and right relation schema.
+
+      // TODO: ARROW-16624 Add Suffix support for Substrait
+      const auto* left_keys = callptr->arguments[0].field_ref();
+      const auto* right_keys = callptr->arguments[1].field_ref();
+      // Validating JoinKeys
+      if (!left_keys || !right_keys) {
+        return Status::Invalid("Left keys for join cannot be null");
+      }
+      // TODO: verify this validation
+      if (!left_keys->IsFieldPath() || !right_keys->IsFieldPath()) {
+        return Status::Invalid("Join keys must be FieldPaths");
+      }
+      int num_left_fields = left.output_schema->num_fields();
+      const auto* right_field_path = right_keys->field_path();
+      std::vector<int> adjusted_field_indices;
+      adjusted_field_indices.reserve(join_schema->num_fields() - num_left_fields);
+      for (size_t idx = 0; idx < right_field_path->indices().size(); idx++) {
+        adjusted_field_indices.push_back((*right_field_path)[idx] - num_left_fields);
+      }
+      FieldPath adjusted_right_keys(adjusted_field_indices);

Review Comment:
   ```suggestion
         std::vector<int> adjusted_field_indices(right_field_path->indices());
         adjusted_field_indices[0] -= num_left_fields;
         FieldPath adjusted_right_keys(adjusted_field_indices);
   ```
   
   We only need to modify the top-level index.  The other indices, if any, refer to struct fields and I'm pretty sure they are relative.
   
   For example, if the fifth column (in the combined schema) is `struct<int32,int32,struct<int32>>` and we want the innermost `int32` as the key and there are 2 left fields then the field path will be `[4, 2, 0]`.  We want to change that to `[2, 2, 0]` and not `[2, 0, -2]`.



##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -479,22 +479,38 @@ Result<DeclarationInfo> FromProto(const substrait::Rel& rel, const ExtensionSet&
             callptr->function_name);
       }
 
-      // TODO: ARROW-16624 Add Suffix support for Substrait
-      const auto* left_keys = callptr->arguments[0].field_ref();
-      const auto* right_keys = callptr->arguments[1].field_ref();
-      if (!left_keys || !right_keys) {
-        return Status::Invalid("Left keys for join cannot be null");
-      }
-
       // Create output schema from left, right relations and join keys
       FieldVector combined_fields = left.output_schema->fields();
       const FieldVector& right_fields = right.output_schema->fields();
       combined_fields.insert(combined_fields.end(), right_fields.begin(),
                              right_fields.end());
       std::shared_ptr<Schema> join_schema = schema(std::move(combined_fields));
 
+      // adjust the join_keys according to Substrait definition where
+      // the join fields are defined by considering the `join_schema` which
+      // is the combination of the left and right relation schema.
+
+      // TODO: ARROW-16624 Add Suffix support for Substrait
+      const auto* left_keys = callptr->arguments[0].field_ref();
+      const auto* right_keys = callptr->arguments[1].field_ref();
+      // Validating JoinKeys
+      if (!left_keys || !right_keys) {
+        return Status::Invalid("Left keys for join cannot be null");
+      }
+      // TODO: verify this validation

Review Comment:
   Yes but I don't think it's possible at the moment for this to be false.  A `field_ref` from Substrait will always be a field path.



##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -479,22 +479,38 @@ Result<DeclarationInfo> FromProto(const substrait::Rel& rel, const ExtensionSet&
             callptr->function_name);
       }
 
-      // TODO: ARROW-16624 Add Suffix support for Substrait
-      const auto* left_keys = callptr->arguments[0].field_ref();
-      const auto* right_keys = callptr->arguments[1].field_ref();
-      if (!left_keys || !right_keys) {
-        return Status::Invalid("Left keys for join cannot be null");
-      }
-
       // Create output schema from left, right relations and join keys
       FieldVector combined_fields = left.output_schema->fields();
       const FieldVector& right_fields = right.output_schema->fields();
       combined_fields.insert(combined_fields.end(), right_fields.begin(),
                              right_fields.end());
       std::shared_ptr<Schema> join_schema = schema(std::move(combined_fields));
 
+      // adjust the join_keys according to Substrait definition where
+      // the join fields are defined by considering the `join_schema` which
+      // is the combination of the left and right relation schema.
+
+      // TODO: ARROW-16624 Add Suffix support for Substrait
+      const auto* left_keys = callptr->arguments[0].field_ref();
+      const auto* right_keys = callptr->arguments[1].field_ref();
+      // Validating JoinKeys
+      if (!left_keys || !right_keys) {
+        return Status::Invalid("Left keys for join cannot be null");

Review Comment:
   ```suggestion
           return Status::Invalid("join condition must include references to both left and right inputs");
   ```
   This condition checks left and right but the message is only left.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org