You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/03/17 22:11:27 UTC

[GitHub] [arrow] benibus commented on a diff in pull request #34537: GH-14939: [C++] Support Table lookups in FieldRef and FieldPath

benibus commented on code in PR #34537:
URL: https://github.com/apache/arrow/pull/34537#discussion_r1140678912


##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
     });
   }
 
+  static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+                                                   const ChunkedArrayVector& table) {
+    return FieldPathGetImpl::Get(
+      path, &table,
+      [](const std::shared_ptr<ChunkedArray>& data) -> const ChunkedArrayVector * {
+          auto chunkedArrayVector = data->Flatten();

Review Comment:
   We typically use `snake_case` for variable names - so `chunked_array_vector`. That includes test code, so you might want to revise the names there too.



##########
cpp/src/arrow/type_test.cc:
##########
@@ -379,6 +382,72 @@ TEST(TestFieldPath, Basics) {
                                   FieldPath({s.num_fields() * 2}).Get(s));
 }
 
+TEST(TestFieldPath, GetForTable) {
+  using testing::HasSubstr;
+
+  const int length = 100;
+  auto f0 = field("alpha", int32());
+  auto f1 = field("beta", int32());
+  auto f2 = field("alpha", int32());
+  auto f3 = field("beta", int32());
+  auto schema = arrow::schema({f0, f1, f2, f3});
+
+  arrow::random::RandomArrayGenerator gen_{42};
+  auto a0 = gen_.ArrayOf(int32(), length);
+  auto a1 = gen_.ArrayOf(int32(), length);
+  auto a2 = gen_.ArrayOf(int32(), length);
+  auto a3 = gen_.ArrayOf(int32(), length);
+  auto arrayVector = ArrayVector({a0, a1, a2, a3});
+
+  auto tablePtr = Table::Make(schema, {a0, a1, a2, a3});

Review Comment:
   It would be worth throwing in a `ASSERT_OK(tablePtr->ValidateFull())` after this (same for `RecordBatch::Make` i guess).



##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
     });
   }
 
+  static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+                                                   const ChunkedArrayVector& table) {
+    return FieldPathGetImpl::Get(
+      path, &table,
+      [](const std::shared_ptr<ChunkedArray>& data) -> const ChunkedArrayVector * {
+          auto chunkedArrayVector = data->Flatten();
+          if (!chunkedArrayVector.ok()) {
+            return nullptr;
+          }
+
+          return &(chunkedArrayVector.ValueUnsafe());

Review Comment:
   The returned pointer won't be valid after the call since `chunkedArrayVector` gets freed once the function's scope exits. (If it doesn't segfault locally, building with `ARROW_USE_ASAN` and `ARROW_USE_UBSAN` should probably catch it).
   
   Instead, you might want to pre-define the vector in the outer scope, then capture it by reference in the lambda.



##########
cpp/src/arrow/type_test.cc:
##########
@@ -379,6 +382,72 @@ TEST(TestFieldPath, Basics) {
                                   FieldPath({s.num_fields() * 2}).Get(s));
 }
 
+TEST(TestFieldPath, GetForTable) {

Review Comment:
   You should test nested fields here as well (i.e. struct fields in the schema) - preferably with multiple child fields. That would probably expose the aforementioned use-after-free issue but we're currently only testing the top nesting level.



##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
     });
   }
 
+  static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+                                                   const ChunkedArrayVector& table) {
+    return FieldPathGetImpl::Get(
+      path, &table,
+      [](const std::shared_ptr<ChunkedArray>& data) -> const ChunkedArrayVector * {
+          auto chunkedArrayVector = data->Flatten();
+          if (!chunkedArrayVector.ok()) {
+            return nullptr;
+          }

Review Comment:
   We probably want to propagate a non-ok status here rather than implicitly discard it. Perhaps you could capture a `Status` variable in the outer-scope which you could check after calling `FieldPathGetImpl::Get`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org