You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/06 00:23:23 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #13401: ARROW-16855: [C++] Adding Read Relation ToProto

westonpace commented on code in PR #13401:
URL: https://github.com/apache/arrow/pull/13401#discussion_r914300910


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -316,5 +323,97 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
       rel.DebugString());
 }
 
+namespace {
+// TODO: add other types
+enum ArrowRelationType : uint8_t {
+  SCAN,
+  FILTER,
+  PROJECT,
+  JOIN,
+  AGGREGATE,
+};
+
+const std::map<std::string, ArrowRelationType> enum_map{
+    {"scan", ArrowRelationType::SCAN},           {"filter", ArrowRelationType::FILTER},
+    {"project", ArrowRelationType::PROJECT},     {"join", ArrowRelationType::JOIN},
+    {"aggregate", ArrowRelationType::AGGREGATE},
+};
+
+struct ExtractRelation {
+  explicit ExtractRelation(substrait::Rel* rel, ExtensionSet* ext_set)
+      : rel_(rel), ext_set_(ext_set) {}
+
+  Status AddRelation(const compute::Declaration& declaration) {
+    const std::string& rel_name = declaration.factory_name;
+    switch (enum_map.find(rel_name)->second) {
+      case ArrowRelationType::SCAN:
+        return AddReadRelation(declaration);
+      case ArrowRelationType::FILTER:
+        return Status::NotImplemented("Filter operator not supported.");
+      case ArrowRelationType::PROJECT:
+        return Status::NotImplemented("Project operator not supported.");
+      case ArrowRelationType::JOIN:
+        return Status::NotImplemented("Join operator not supported.");
+      case ArrowRelationType::AGGREGATE:
+        return Status::NotImplemented("Aggregate operator not supported.");
+      default:
+        return Status::Invalid("Unsupported factory name :", rel_name);
+    }
+  }

Review Comment:
   It would increase readability / robustness but I don't think it's possible because we need to be open to users adding new node types and node factories beyond what Arrow provides.
   
   An enum would be slightly faster but converting a plan from Substrait to Acero is also something that happens once per query so it is ok if it takes a little bit of time.  If it were happening per-batch (or especially per-row) then it might be more of a consideration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org