You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/04/17 18:49:11 UTC

[GitHub] [arrow] benibus opened a new pull request, #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

benibus opened a new pull request, #35197:
URL: https://github.com/apache/arrow/pull/35197

   ### Rationale for this change
   
   The current `FieldPath::Get` methods - when extracting nested child values - don't combine the child's null bitmap with higher-level parent bitmaps. While this is often preferable (it allows for zero-copy), there are cases where higher level "flattening" version is useful - e.g. when pre-processing sort keys for structs.
   
   ### What changes are included in this PR?
   
   - Adds `FieldPath::GetFlattened` methods alongside the existing  `FieldPath::Get` overloads
   - Overhauls the `FieldPath` tests in an effort to improve coverage and generalize across the supported input types
   - Fixes a bug with nested `ArrayData` handling when the input array is a slice of a larger array (surfaced in the new tests)
   
   ### Are these changes tested?
   
   Yes (tests are included)
   
   ### Are there any user-facing changes?
   
   Yes, this adds methods to a public API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1532251167

   @benibus can you ping the people who understood the issue in this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1511909454

   * Closes: #14946


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175652162


##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,41 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's
+  Result<std::shared_ptr<Array>> GetFlattened(const Array& array) const;
+  Result<std::shared_ptr<ArrayData>> GetFlattened(const ArrayData& data) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(
+      const ChunkedArray& chunked_array) const;
+  Result<std::shared_ptr<Array>> GetFlattened(const RecordBatch& batch) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const Table& table) const;
+
  private:
   std::vector<int> indices_;
 };
 
+namespace internal {
+
+template <typename T>
+using FieldPathGetType =
+    decltype(std::declval<FieldPath>().Get(std::declval<T>()).ValueOrDie());

Review Comment:
   I have a hard time inferring what comes out of this. An explicit type trait struct with specializations would be more informative and produce cleaner error messages for users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175588223


##########
cpp/src/arrow/type.cc:
##########
@@ -1328,20 +1363,54 @@ Result<std::shared_ptr<Array>> FieldPath::Get(const Array& array) const {
 
 Result<std::shared_ptr<ArrayData>> FieldPath::Get(const ArrayData& data) const {
   if (data.type->id() != Type::STRUCT) {
-    return Status::NotImplemented("Get child data of non-struct array");
+    return FieldPathGetImpl::NonStructError();
   }
   return FieldPathGetImpl::Get(this, data.child_data);
 }
 
 Result<std::shared_ptr<ChunkedArray>> FieldPath::Get(
     const ChunkedArray& chunked_array) const {
   if (chunked_array.type()->id() != Type::STRUCT) {
-    return Status::NotImplemented("Get child data of non-struct chunked array");
+    return FieldPathGetImpl::NonStructError();
   }
-  auto columns = ChunkedArrayRef(chunked_array).Flatten();
+  auto columns = ChunkedArrayRef(chunked_array).FlattenZeroCopy();
   return FieldPathGetImpl::Get(this, columns);
 }
 
+Result<std::shared_ptr<Array>> FieldPath::GetFlattened(const Array& array) const {
+  if (array.type_id() != Type::STRUCT) {
+    return FieldPathGetImpl::NonStructError();
+  }
+  auto&& struct_array = checked_cast<const StructArray&>(array);

Review Comment:
   `&&`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1194117861


##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,45 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's
+  Result<std::shared_ptr<Array>> GetFlattened(const Array& array,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ArrayData>> GetFlattened(const ArrayData& data,
+                                                  MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const ChunkedArray& chunked_array,
+                                                     MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<Array>> GetFlattened(const RecordBatch& batch,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const Table& table,
+                                                     MemoryPool* pool = NULLPTR) const;
+
  private:
   std::vector<int> indices_;
 };
 
+namespace internal {
+
+template <typename T>
+using FieldPathGetType =
+    decltype(std::declval<FieldPath>().Get(std::declval<T>()).ValueOrDie());
+
+template <bool Flattened, typename T>
+std::enable_if_t<!Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* = NULLPTR) {
+  return path.Get(root);
+}
+template <bool Flattened, typename T>
+std::enable_if_t<Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* pool = NULLPTR) {
+  return path.GetFlattened(root, pool);
+}

Review Comment:
   Yeah, unfortunately `if constexpr` wouldn't work here since `GetFlattened` doesn't have overloads for `Schema`, `Field`, etc.
   
   But I'm fine with just duplicating the definitions as well (in that case, the helpers probably wouldn't be too useful outside of the tests so no need to expose them).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #35197:
URL: https://github.com/apache/arrow/pull/35197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196528908


##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,341 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {
+  struct OutputValues {
+    explicit OutputValues(std::vector<int> indices = {})
+        : path(FieldPath(std::move(indices))) {}
+
+    template <typename T>
+    const auto& OutputAs() const {
+      if constexpr (std::is_same_v<T, Field>) {
+        return field;
+      } else if constexpr (std::is_same_v<T, Array>) {
+        return array;
+      } else if constexpr (std::is_same_v<T, ArrayData>) {
+        return array->data();
+      } else if constexpr (std::is_same_v<T, ChunkedArray>) {
+        return chunked_array;
+      }
+    }
 
-  // retrieving a field with single-element FieldPath is equivalent to Schema::field
-  for (int index = 0; index < s.num_fields(); ++index) {
-    ASSERT_OK_AND_EQ(s.field(index), FieldPath({index}).Get(s));
+    FieldPath path;
+    std::shared_ptr<Field> field;
+    std::shared_ptr<Array> array;
+    std::shared_ptr<ChunkedArray> chunked_array;
+  };
+
+  static constexpr int kNumColumns = 2;
+  static constexpr int kNumRows = 100;
+  static constexpr int kRandomSeed = 0xbeef;
+
+  // Input for the FieldPath::Get functions in multiple forms
+  std::shared_ptr<Schema> schema;
+  std::shared_ptr<DataType> type;
+  std::shared_ptr<Array> array;
+  std::shared_ptr<RecordBatch> record_batch;
+  std::shared_ptr<ChunkedArray> chunked_array;
+  std::shared_ptr<Table> table;
+
+  template <typename T>
+  const auto& InputAs() const {
+    if constexpr (std::is_same_v<T, Schema>) {
+      return schema;
+    } else if constexpr (std::is_same_v<T, DataType>) {
+      return type;
+    } else if constexpr (std::is_same_v<T, Array>) {
+      return array;
+    } else if constexpr (std::is_same_v<T, ArrayData>) {
+      return array->data();
+    } else if constexpr (std::is_same_v<T, RecordBatch>) {
+      return record_batch;
+    } else if constexpr (std::is_same_v<T, ChunkedArray>) {
+      return chunked_array;
+    } else if constexpr (std::is_same_v<T, Table>) {
+      return table;
+    }
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
-                                  testing::HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(s));
-  EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, testing::HasSubstr("index out of range"),
-                                  FieldPath({s.num_fields() * 2}).Get(s));
-}
-
-TEST(TestFieldPath, GetForTable) {
-  using testing::HasSubstr;
-
-  constexpr int kNumRows = 4;
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto table_schema = schema({f0, f1, f2, f3});
-
-  // Each column has a different chunking
-  ChunkedArrayVector columns(4);
-  columns[0] = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  columns[1] = ChunkedArrayFromJSON(f1->type(), {"[3,2,1]", "[0]"});
-  columns[2] =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2}])", R"([{"b":1},{"b":0}])"});
-  columns[3] = ChunkedArrayFromJSON(
-      f3->type(), {R"([{"a":0,"c":{"b":3}},{"a":1,"c":{"b":2}}])",
-                   R"([{"a":2,"c":{"b":1}}])", R"([{"a":3,"c":{"b":0}}])"});
-  auto table = Table::Make(table_schema, columns, kNumRows);
-  ASSERT_OK(table->ValidateFull());
 
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2, FieldPath({2}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3, FieldPath({3}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_0, FieldPath({3, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1, FieldPath({3, 1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1_0, FieldPath({3, 1, 0}).Get(*table));
-
-  EXPECT_EQ(v0->num_chunks(), columns[0]->num_chunks());
-  EXPECT_EQ(v1->num_chunks(), columns[1]->num_chunks());
-  EXPECT_EQ(v2->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v2_0->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v3->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_0->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1_0->num_chunks(), columns[3]->num_chunks());
-
-  EXPECT_TRUE(columns[0]->Equals(v0));
-  EXPECT_TRUE(columns[0]->Equals(v3_0));
-
-  EXPECT_TRUE(columns[1]->Equals(v1));
-  EXPECT_TRUE(columns[1]->Equals(v2_0));
-  EXPECT_TRUE(columns[1]->Equals(v3_1_0));
-
-  EXPECT_TRUE(columns[2]->Equals(v2));
-  EXPECT_TRUE(columns[2]->Equals(v3_1));
-
-  EXPECT_TRUE(columns[3]->Equals(v3));
-
-  for (const auto& path :
-       {FieldPath({4, 1, 0}), FieldPath({3, 2, 0}), FieldPath{3, 1, 1}}) {
-    EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, HasSubstr("index out of range"),
-                                    path.Get(*table));
+  // Number of chunks for each column in the input Table
+  const std::array<int, kNumColumns> num_column_chunks = {15, 20};
+  // Number of chunks in the input ChunkedArray
+  const int num_chunks = 15;
+
+  // Expected outputs for each child;
+  OutputValues v0{{0}}, v1{{1}};
+  OutputValues v1_0{{1, 0}}, v1_1{{1, 1}};
+  OutputValues v1_1_0{{1, 1, 0}}, v1_1_1{{1, 1, 1}};
+  // Expected outputs for nested children with null flattening applied
+  OutputValues v1_0_flat{{1, 0}}, v1_1_flat{{1, 1}};
+  OutputValues v1_1_0_flat{{1, 1, 0}}, v1_1_1_flat{{1, 1, 1}};
+
+  static const FieldPathTestCase* Instance() {
+    static const auto maybe_instance = Make();
+    return &maybe_instance.ValueOrDie();
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(*table));
-}
-
-TEST(TestFieldPath, GetForChunkedArray) {
-  using testing::HasSubstr;
-
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto type = struct_({f0, f1, f3});
-
-  auto column0 = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  auto column1 = ChunkedArrayFromJSON(f1->type(), {"[3,2,1,0]"});
-  auto column2_1 =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2},{"b":1},{"b":0}])"});
-  auto chunked_array = ChunkedArrayFromJSON(
-      type,
-      {
-          R"([{"a":0,"b":3,"d":{"a":0,"c":{"b":3}}}])",
-          R"([{"a":1,"b":2,"d":{"a":1,"c":{"b":2}}},{"a":2,"b":1,"d":{"a":2,"c":{"b":1}}}])",
-          R"([{"a":3,"b":0,"d":{"a":3,"c":{"b":0}}}])",
-      });
-  ASSERT_OK(chunked_array->ValidateFull());
-
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1, FieldPath({2, 1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1_0, FieldPath({2, 1, 0}).Get(*chunked_array));
-
-  for (const auto& v : {v0, v1, v2_0, v2_1, v2_1_0}) {
-    EXPECT_EQ(v->num_chunks(), chunked_array->num_chunks());
+
+  static Result<FieldPathTestCase> Make() {
+    // Generate test input based on a single schema. First by creating a StructArray,
+    // then deriving the other input types (ChunkedArray, RecordBatch, Table, etc) from
+    // it. We also compute the expected outputs for each child individually (for each
+    // output type).
+    FieldPathTestCase out;
+    random::RandomArrayGenerator gen(kRandomSeed);
+
+    // Define child fields and input schema
+
+    // Intentionally duplicated names for the FieldRef tests
+    out.v1_1_1.field = field("a", boolean());
+    out.v1_1_0.field = field("a", float32());
+
+    out.v1_1.field = field("b", struct_({out.v1_1_0.field, out.v1_1_1.field}));
+    out.v1_0.field = field("a", int32());
+    out.v1.field = field("b", struct_({out.v1_0.field, out.v1_1.field}));
+    out.v0.field = field("a", utf8());
+    out.schema = arrow::schema({out.v0.field, out.v1.field});
+    out.type = struct_(out.schema->fields());
+
+    // Create null bitmaps for the struct fields independent of its childrens'
+    // bitmaps. For FieldPath::GetFlattened, parent/child bitmaps should be combined
+    // - for FieldPath::Get, higher-level nulls are ignored.
+    auto bitmap1_1 = gen.NullBitmap(kNumRows, 0.15);
+    auto bitmap1 = gen.NullBitmap(kNumRows, 0.30);
+
+    // Generate raw leaf arrays
+    out.v1_1_1.array = gen.ArrayOf(out.v1_1_1.field->type(), kNumRows);
+    out.v1_1_0.array = gen.ArrayOf(out.v1_1_0.field->type(), kNumRows);
+    out.v1_0.array = gen.ArrayOf(out.v1_0.field->type(), kNumRows);
+    out.v0.array = gen.ArrayOf(out.v0.field->type(), kNumRows);
+    // Make struct fields from leaf arrays (we use the custom bitmaps here)
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1.array,
+        StructArray::Make({out.v1_1_0.array, out.v1_1_1.array},
+                          {out.v1_1_0.field, out.v1_1_1.field}, bitmap1_1));
+    ARROW_ASSIGN_OR_RAISE(out.v1.array,
+                          StructArray::Make({out.v1_0.array, out.v1_1.array},
+                                            {out.v1_0.field, out.v1_1.field}, bitmap1));
+
+    // Not used to create the test input, but pre-compute flattened versions of nested
+    // arrays for comparisons in the GetFlattened tests.
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(1));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(1));
+    // Sanity check
+    ARROW_CHECK(!out.v1_0_flat.array->Equals(out.v1_0.array));
+    ARROW_CHECK(!out.v1_1_flat.array->Equals(out.v1_1.array));
+    ARROW_CHECK(!out.v1_1_0_flat.array->Equals(out.v1_1_0.array));
+    ARROW_CHECK(!out.v1_1_1_flat.array->Equals(out.v1_1_1.array));
+
+    // Finalize the input Array
+    ARROW_ASSIGN_OR_RAISE(out.array, StructArray::Make({out.v0.array, out.v1.array},
+                                                       {out.v0.field, out.v1.field}));
+    ARROW_RETURN_NOT_OK(out.array->ValidateFull());
+    // Finalize the input RecordBatch
+    ARROW_ASSIGN_OR_RAISE(out.record_batch, RecordBatch::FromStructArray(out.array));
+    ARROW_RETURN_NOT_OK(out.record_batch->ValidateFull());
+    // Finalize the input ChunkedArray
+    out.chunked_array = SliceToChunkedArray(*out.array, out.num_chunks);
+    ARROW_RETURN_NOT_OK(out.chunked_array->ValidateFull());
+
+    // For each expected child array, create a chunked equivalent (we use a different
+    // chunk layout for each top-level column to make the Table test more interesting)
+    for (OutputValues* v :
+         {&out.v0, &out.v1, &out.v1_0, &out.v1_1, &out.v1_1_0, &out.v1_1_1,
+          &out.v1_0_flat, &out.v1_1_flat, &out.v1_1_0_flat, &out.v1_1_1_flat}) {
+      v->chunked_array =
+          SliceToChunkedArray(*v->array, out.num_column_chunks[v->path[0]]);
+    }
+    // Finalize the input Table
+    out.table =
+        Table::Make(out.schema, {out.v0.chunked_array, out.v1.chunked_array}, kNumRows);
+    ARROW_RETURN_NOT_OK(out.table->ValidateFull());
+
+    return std::move(out);
   }
 
-  EXPECT_TRUE(column0->Equals(v0));
-  EXPECT_TRUE(column0->Equals(v2_0));
+  static std::shared_ptr<ChunkedArray> SliceToChunkedArray(const Array& array,
+                                                           int num_chunks) {
+    ARROW_CHECK(num_chunks > 0 && array.length() >= num_chunks);
+    ArrayVector chunks;
+    chunks.reserve(num_chunks);
+    for (int64_t inc = array.length() / num_chunks, beg = 0,
+                 end = inc + array.length() % num_chunks;
+         end <= array.length(); beg = end, end += inc) {
+      chunks.push_back(array.SliceSafe(beg, end - beg).ValueOrDie());
+    }
+    ARROW_CHECK_EQ(static_cast<int>(chunks.size()), num_chunks);
+    return ChunkedArray::Make(std::move(chunks)).ValueOrDie();
+  }
+};
 
-  EXPECT_TRUE(column1->Equals(v1));
-  EXPECT_TRUE(column1->Equals(v2_1_0));
-  EXPECT_FALSE(column1->Equals(v2_1));
+class FieldPathTestFixture : public ::testing::Test {
+ public:
+  FieldPathTestFixture() : case_(FieldPathTestCase::Instance()) {}
 
-  EXPECT_TRUE(column2_1->Equals(v2_1));
+ protected:
+  template <typename T>
+  using OutputType = typename FieldRef::GetType<T>::element_type;
 
-  EXPECT_RAISES_WITH_MESSAGE_THAT(NotImplemented,
-                                  HasSubstr("Get child data of non-struct chunked array"),
-                                  FieldPath({0}).Get(*column0));
-}
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Field>& expected,
+                          const std::shared_ptr<Field>& actual) const {
+    AssertFieldEqual(*expected, *actual);
+  }
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Array>& expected,
+                          const std::shared_ptr<Array>& actual) const {
+    AssertArraysEqual(*expected, *actual);

Review Comment:
   I might have missed it somehow, but we probably want to call `ValidateFull` on the actual results here (and below).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196521220


##########
cpp/src/arrow/type.h:
##########
@@ -1905,6 +1935,15 @@ class ARROW_EXPORT FieldRef : public util::EqualityComparable<FieldRef> {
     }
     return match.Get(root).ValueOrDie();
   }
+  template <typename T>
+  Result<GetType<T>> GetOneOrNoneFlattened(const T& root,
+                                           MemoryPool* pool = NULLPTR) const {
+    ARROW_ASSIGN_OR_RAISE(auto match, FindOneOrNone(root));
+    if (match.empty()) {
+      return static_cast<GetType<T>>(NULLPTR);
+    }
+    return match.GetFlattened(root, pool).ValueOrDie();

Review Comment:
   Same here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1569422562

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175670935


##########
cpp/src/arrow/type.cc:
##########
@@ -1129,18 +1129,20 @@ class ChunkedArrayData : public ChunkedColumn {
 // Return a vector of ChunkedColumns - one for each struct field.
 // Unlike ChunkedArray::Flatten, this is zero-copy and doesn't merge parent/child
 // validity bitmaps.
-ChunkedColumnVector ChunkedColumn::Flatten() const {
+ChunkedColumnVector ChunkedColumn::FlattenZeroCopy() const {
   DCHECK_EQ(type()->id(), Type::STRUCT);
 
   ChunkedColumnVector columns(type()->num_fields());
   for (int column_idx = 0; column_idx < type()->num_fields(); ++column_idx) {
     const auto& child_type = type()->field(column_idx)->type();
     ArrayDataVector chunks(num_chunks());
     for (int chunk_idx = 0; chunk_idx < num_chunks(); ++chunk_idx) {
-      const auto& child_data = chunk(chunk_idx)->child_data;
-      DCHECK_EQ(columns.size(), child_data.size());
-      DCHECK(child_type->Equals(child_data[column_idx]->type));
-      chunks[chunk_idx] = child_data[column_idx];
+      const auto& parent = chunk(chunk_idx);
+      const auto& children = parent->child_data;
+      DCHECK_EQ(columns.size(), children.size());
+      auto child = children[column_idx]->Slice(parent->offset, parent->length);

Review Comment:
   Yes (a bug that I introduced in https://github.com/apache/arrow/pull/34537). It gives incorrect results when retrieving a nested child from an array sliced from another array.
   
   Although, now that I'm thinking about it, I'd imagine the same problem would occur for the non-chunked `FieldPath::Get` variants as well. I'll need to test it and create a separate issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1194041108


##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,45 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's
+  Result<std::shared_ptr<Array>> GetFlattened(const Array& array,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ArrayData>> GetFlattened(const ArrayData& data,
+                                                  MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const ChunkedArray& chunked_array,
+                                                     MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<Array>> GetFlattened(const RecordBatch& batch,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const Table& table,
+                                                     MemoryPool* pool = NULLPTR) const;
+
  private:
   std::vector<int> indices_;
 };
 
+namespace internal {
+
+template <typename T>
+using FieldPathGetType =
+    decltype(std::declval<FieldPath>().Get(std::declval<T>()).ValueOrDie());
+
+template <bool Flattened, typename T>
+std::enable_if_t<!Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* = NULLPTR) {
+  return path.Get(root);
+}
+template <bool Flattened, typename T>
+std::enable_if_t<Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* pool = NULLPTR) {
+  return path.GetFlattened(root, pool);
+}

Review Comment:
   I'm not sure the indirection through these helpers and the `Flattened` templatization is useful (instead of simply duplicating the definition of e.g. `FieldRef::GetOne` into `FieldRef::GetOneFlattened`).
   
   But if it is, we can probably shorten this helper code with C++17 features:
   1) using `auto` and return type deduction to avoid the `decltype` dance
   2) using `if constexpr` to avoid SFINAE (not 100% it will work here, but worth a try)
   



##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,45 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's

Review Comment:
   Perhaps "ancestors" plural?
   ```suggestion
     /// null bitmap is ANDed with its ancestors'
   ```



##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,344 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {
+  struct OutputValues {
+    explicit OutputValues(std::vector<int> indices = {})
+        : path(FieldPath(std::move(indices))) {}
 
-  // retrieving a field with single-element FieldPath is equivalent to Schema::field
-  for (int index = 0; index < s.num_fields(); ++index) {
-    ASSERT_OK_AND_EQ(s.field(index), FieldPath({index}).Get(s));
+    template <typename T>
+    auto&& Get() const;
+
+    FieldPath path;
+    std::shared_ptr<Field> field;
+    std::shared_ptr<Array> array;
+    std::shared_ptr<ChunkedArray> chunked_array;
+  };
+
+  static constexpr int kNumColumns = 2;
+  static constexpr int kNumRows = 100;
+  static constexpr int kRandomSeed = 0xbeef;
+
+  // Input for the FieldPath::Get functions in multiple forms
+  std::shared_ptr<Schema> schema;
+  std::shared_ptr<DataType> type;
+  std::shared_ptr<Array> array;
+  std::shared_ptr<RecordBatch> record_batch;
+  std::shared_ptr<ChunkedArray> chunked_array;
+  std::shared_ptr<Table> table;
+
+  template <typename T>
+  auto&& GetInput() const;
+
+  // Number of chunks for each column in the input Table
+  const std::array<int, kNumColumns> num_column_chunks = {15, 20};
+  // Number of chunks in the input ChunkedArray
+  const int num_chunks = 15;
+
+  // Expected outputs for each child;
+  OutputValues v0{{0}}, v1{{1}};
+  OutputValues v1_0{{1, 0}}, v1_1{{1, 1}};
+  OutputValues v1_1_0{{1, 1, 0}}, v1_1_1{{1, 1, 1}};
+  // Expected outputs for nested children with null flattening applied
+  OutputValues v1_0_flat{{1, 0}}, v1_1_flat{{1, 1}};
+  OutputValues v1_1_0_flat{{1, 1, 0}}, v1_1_1_flat{{1, 1, 1}};
+
+  static const FieldPathTestCase* Instance() {
+    static const auto maybe_instance = Make();
+    return &maybe_instance.ValueOrDie();
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
-                                  testing::HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(s));
-  EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, testing::HasSubstr("index out of range"),
-                                  FieldPath({s.num_fields() * 2}).Get(s));
-}
-
-TEST(TestFieldPath, GetForTable) {
-  using testing::HasSubstr;
-
-  constexpr int kNumRows = 4;
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto table_schema = schema({f0, f1, f2, f3});
-
-  // Each column has a different chunking
-  ChunkedArrayVector columns(4);
-  columns[0] = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  columns[1] = ChunkedArrayFromJSON(f1->type(), {"[3,2,1]", "[0]"});
-  columns[2] =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2}])", R"([{"b":1},{"b":0}])"});
-  columns[3] = ChunkedArrayFromJSON(
-      f3->type(), {R"([{"a":0,"c":{"b":3}},{"a":1,"c":{"b":2}}])",
-                   R"([{"a":2,"c":{"b":1}}])", R"([{"a":3,"c":{"b":0}}])"});
-  auto table = Table::Make(table_schema, columns, kNumRows);
-  ASSERT_OK(table->ValidateFull());
 
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2, FieldPath({2}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3, FieldPath({3}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_0, FieldPath({3, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1, FieldPath({3, 1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1_0, FieldPath({3, 1, 0}).Get(*table));
-
-  EXPECT_EQ(v0->num_chunks(), columns[0]->num_chunks());
-  EXPECT_EQ(v1->num_chunks(), columns[1]->num_chunks());
-  EXPECT_EQ(v2->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v2_0->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v3->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_0->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1_0->num_chunks(), columns[3]->num_chunks());
-
-  EXPECT_TRUE(columns[0]->Equals(v0));
-  EXPECT_TRUE(columns[0]->Equals(v3_0));
-
-  EXPECT_TRUE(columns[1]->Equals(v1));
-  EXPECT_TRUE(columns[1]->Equals(v2_0));
-  EXPECT_TRUE(columns[1]->Equals(v3_1_0));
-
-  EXPECT_TRUE(columns[2]->Equals(v2));
-  EXPECT_TRUE(columns[2]->Equals(v3_1));
-
-  EXPECT_TRUE(columns[3]->Equals(v3));
-
-  for (const auto& path :
-       {FieldPath({4, 1, 0}), FieldPath({3, 2, 0}), FieldPath{3, 1, 1}}) {
-    EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, HasSubstr("index out of range"),
-                                    path.Get(*table));
+  static Result<FieldPathTestCase> Make() {
+    // Generate test input based on a single schema. First by creating a StructArray,
+    // then deriving the other input types (ChunkedArray, RecordBatch, Table, etc) from
+    // it. We also compute the expected outputs for each child individually (for each
+    // output type).
+    FieldPathTestCase out;
+    random::RandomArrayGenerator gen(kRandomSeed);
+
+    // Define child fields and input schema
+    out.v1_1_1.field = field("b", boolean());
+    out.v1_1_0.field = field("f", float32());
+    out.v1_1.field = field("s1", struct_({out.v1_1_0.field, out.v1_1_1.field}));
+    out.v1_0.field = field("i", int32());
+    out.v1.field = field("s0", struct_({out.v1_0.field, out.v1_1.field}));
+    out.v0.field = field("u", utf8());
+    out.schema = arrow::schema({out.v0.field, out.v1.field});
+    out.type = struct_(out.schema->fields());
+
+    // Create null bitmaps for the struct fields independent of its childrens'
+    // bitmaps. For FieldPath::GetFlattened, parent/child bitmaps should be combined
+    // - for FieldPath::Get, higher-level nulls are ignored.
+    auto bitmap1_1 = gen.NullBitmap(kNumRows, 0.15);
+    auto bitmap1 = gen.NullBitmap(kNumRows, 0.30);
+
+    // Generate raw leaf arrays
+    out.v1_1_1.array = gen.ArrayOf(out.v1_1_1.field->type(), kNumRows);
+    out.v1_1_0.array = gen.ArrayOf(out.v1_1_0.field->type(), kNumRows);
+    out.v1_0.array = gen.ArrayOf(out.v1_0.field->type(), kNumRows);
+    out.v0.array = gen.ArrayOf(out.v0.field->type(), kNumRows);
+    // Make struct fields from leaf arrays (we use the custom bitmaps here)
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1.array,
+        StructArray::Make({out.v1_1_0.array, out.v1_1_1.array},
+                          {out.v1_1_0.field, out.v1_1_1.field}, bitmap1_1));
+    ARROW_ASSIGN_OR_RAISE(out.v1.array,
+                          StructArray::Make({out.v1_0.array, out.v1_1.array},
+                                            {out.v1_0.field, out.v1_1.field}, bitmap1));
+
+    // Not used to create the test input, but pre-compute flattened versions of nested
+    // arrays for comparisons in the GetFlattened tests.
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(1));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(1));
+    // Sanity check
+    ARROW_CHECK(!out.v1_0_flat.array->Equals(out.v1_0.array));
+    ARROW_CHECK(!out.v1_1_flat.array->Equals(out.v1_1.array));
+    ARROW_CHECK(!out.v1_1_0_flat.array->Equals(out.v1_1_0.array));
+    ARROW_CHECK(!out.v1_1_1_flat.array->Equals(out.v1_1_1.array));
+
+    // Finalize the input Array
+    ARROW_ASSIGN_OR_RAISE(out.array, StructArray::Make({out.v0.array, out.v1.array},
+                                                       {out.v0.field, out.v1.field}));
+    ARROW_RETURN_NOT_OK(out.array->ValidateFull());
+    // Finalize the input RecordBatch
+    ARROW_ASSIGN_OR_RAISE(out.record_batch, RecordBatch::FromStructArray(out.array));
+    ARROW_RETURN_NOT_OK(out.record_batch->ValidateFull());
+    // Finalize the input ChunkedArray
+    out.chunked_array = SliceToChunkedArray(*out.array, out.num_chunks);
+    ARROW_RETURN_NOT_OK(out.chunked_array->ValidateFull());
+
+    // For each expected child array, create a chunked equivalent (we use a different
+    // chunk layout for each top-level column to make the Table test more interesting)
+    for (OutputValues* v :
+         {&out.v0, &out.v1, &out.v1_0, &out.v1_1, &out.v1_1_0, &out.v1_1_1,
+          &out.v1_0_flat, &out.v1_1_flat, &out.v1_1_0_flat, &out.v1_1_1_flat}) {
+      v->chunked_array =
+          SliceToChunkedArray(*v->array, out.num_column_chunks[v->path[0]]);
+    }
+    // Finalize the input Table
+    out.table =
+        Table::Make(out.schema, {out.v0.chunked_array, out.v1.chunked_array}, kNumRows);
+    ARROW_RETURN_NOT_OK(out.table->ValidateFull());
+
+    return std::move(out);
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(*table));
-}
-
-TEST(TestFieldPath, GetForChunkedArray) {
-  using testing::HasSubstr;
-
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto type = struct_({f0, f1, f3});
-
-  auto column0 = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  auto column1 = ChunkedArrayFromJSON(f1->type(), {"[3,2,1,0]"});
-  auto column2_1 =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2},{"b":1},{"b":0}])"});
-  auto chunked_array = ChunkedArrayFromJSON(
-      type,
-      {
-          R"([{"a":0,"b":3,"d":{"a":0,"c":{"b":3}}}])",
-          R"([{"a":1,"b":2,"d":{"a":1,"c":{"b":2}}},{"a":2,"b":1,"d":{"a":2,"c":{"b":1}}}])",
-          R"([{"a":3,"b":0,"d":{"a":3,"c":{"b":0}}}])",
-      });
-  ASSERT_OK(chunked_array->ValidateFull());
-
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1, FieldPath({2, 1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1_0, FieldPath({2, 1, 0}).Get(*chunked_array));
-
-  for (const auto& v : {v0, v1, v2_0, v2_1, v2_1_0}) {
-    EXPECT_EQ(v->num_chunks(), chunked_array->num_chunks());
+
+  static std::shared_ptr<ChunkedArray> SliceToChunkedArray(const Array& array,
+                                                           int num_chunks) {
+    ARROW_CHECK(num_chunks > 0 && array.length() >= num_chunks);

Review Comment:
   Generally we should favor `ASSERT` or `EXPECT` macros in tests, since they should arrow the tests to proceed (unless the test would otherwise crash just after the failed check).



##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,344 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {
+  struct OutputValues {
+    explicit OutputValues(std::vector<int> indices = {})
+        : path(FieldPath(std::move(indices))) {}
 
-  // retrieving a field with single-element FieldPath is equivalent to Schema::field
-  for (int index = 0; index < s.num_fields(); ++index) {
-    ASSERT_OK_AND_EQ(s.field(index), FieldPath({index}).Get(s));
+    template <typename T>
+    auto&& Get() const;
+
+    FieldPath path;
+    std::shared_ptr<Field> field;
+    std::shared_ptr<Array> array;
+    std::shared_ptr<ChunkedArray> chunked_array;
+  };
+
+  static constexpr int kNumColumns = 2;
+  static constexpr int kNumRows = 100;
+  static constexpr int kRandomSeed = 0xbeef;
+
+  // Input for the FieldPath::Get functions in multiple forms
+  std::shared_ptr<Schema> schema;
+  std::shared_ptr<DataType> type;
+  std::shared_ptr<Array> array;
+  std::shared_ptr<RecordBatch> record_batch;
+  std::shared_ptr<ChunkedArray> chunked_array;
+  std::shared_ptr<Table> table;
+
+  template <typename T>
+  auto&& GetInput() const;
+
+  // Number of chunks for each column in the input Table
+  const std::array<int, kNumColumns> num_column_chunks = {15, 20};
+  // Number of chunks in the input ChunkedArray
+  const int num_chunks = 15;
+
+  // Expected outputs for each child;
+  OutputValues v0{{0}}, v1{{1}};
+  OutputValues v1_0{{1, 0}}, v1_1{{1, 1}};
+  OutputValues v1_1_0{{1, 1, 0}}, v1_1_1{{1, 1, 1}};
+  // Expected outputs for nested children with null flattening applied
+  OutputValues v1_0_flat{{1, 0}}, v1_1_flat{{1, 1}};
+  OutputValues v1_1_0_flat{{1, 1, 0}}, v1_1_1_flat{{1, 1, 1}};
+
+  static const FieldPathTestCase* Instance() {
+    static const auto maybe_instance = Make();
+    return &maybe_instance.ValueOrDie();
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
-                                  testing::HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(s));
-  EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, testing::HasSubstr("index out of range"),
-                                  FieldPath({s.num_fields() * 2}).Get(s));
-}
-
-TEST(TestFieldPath, GetForTable) {
-  using testing::HasSubstr;
-
-  constexpr int kNumRows = 4;
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto table_schema = schema({f0, f1, f2, f3});
-
-  // Each column has a different chunking
-  ChunkedArrayVector columns(4);
-  columns[0] = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  columns[1] = ChunkedArrayFromJSON(f1->type(), {"[3,2,1]", "[0]"});
-  columns[2] =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2}])", R"([{"b":1},{"b":0}])"});
-  columns[3] = ChunkedArrayFromJSON(
-      f3->type(), {R"([{"a":0,"c":{"b":3}},{"a":1,"c":{"b":2}}])",
-                   R"([{"a":2,"c":{"b":1}}])", R"([{"a":3,"c":{"b":0}}])"});
-  auto table = Table::Make(table_schema, columns, kNumRows);
-  ASSERT_OK(table->ValidateFull());
 
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2, FieldPath({2}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3, FieldPath({3}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_0, FieldPath({3, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1, FieldPath({3, 1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1_0, FieldPath({3, 1, 0}).Get(*table));
-
-  EXPECT_EQ(v0->num_chunks(), columns[0]->num_chunks());
-  EXPECT_EQ(v1->num_chunks(), columns[1]->num_chunks());
-  EXPECT_EQ(v2->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v2_0->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v3->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_0->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1_0->num_chunks(), columns[3]->num_chunks());
-
-  EXPECT_TRUE(columns[0]->Equals(v0));
-  EXPECT_TRUE(columns[0]->Equals(v3_0));
-
-  EXPECT_TRUE(columns[1]->Equals(v1));
-  EXPECT_TRUE(columns[1]->Equals(v2_0));
-  EXPECT_TRUE(columns[1]->Equals(v3_1_0));
-
-  EXPECT_TRUE(columns[2]->Equals(v2));
-  EXPECT_TRUE(columns[2]->Equals(v3_1));
-
-  EXPECT_TRUE(columns[3]->Equals(v3));
-
-  for (const auto& path :
-       {FieldPath({4, 1, 0}), FieldPath({3, 2, 0}), FieldPath{3, 1, 1}}) {
-    EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, HasSubstr("index out of range"),
-                                    path.Get(*table));
+  static Result<FieldPathTestCase> Make() {
+    // Generate test input based on a single schema. First by creating a StructArray,
+    // then deriving the other input types (ChunkedArray, RecordBatch, Table, etc) from
+    // it. We also compute the expected outputs for each child individually (for each
+    // output type).
+    FieldPathTestCase out;
+    random::RandomArrayGenerator gen(kRandomSeed);
+
+    // Define child fields and input schema
+    out.v1_1_1.field = field("b", boolean());
+    out.v1_1_0.field = field("f", float32());
+    out.v1_1.field = field("s1", struct_({out.v1_1_0.field, out.v1_1_1.field}));
+    out.v1_0.field = field("i", int32());
+    out.v1.field = field("s0", struct_({out.v1_0.field, out.v1_1.field}));
+    out.v0.field = field("u", utf8());
+    out.schema = arrow::schema({out.v0.field, out.v1.field});
+    out.type = struct_(out.schema->fields());
+
+    // Create null bitmaps for the struct fields independent of its childrens'
+    // bitmaps. For FieldPath::GetFlattened, parent/child bitmaps should be combined
+    // - for FieldPath::Get, higher-level nulls are ignored.
+    auto bitmap1_1 = gen.NullBitmap(kNumRows, 0.15);
+    auto bitmap1 = gen.NullBitmap(kNumRows, 0.30);
+
+    // Generate raw leaf arrays
+    out.v1_1_1.array = gen.ArrayOf(out.v1_1_1.field->type(), kNumRows);
+    out.v1_1_0.array = gen.ArrayOf(out.v1_1_0.field->type(), kNumRows);
+    out.v1_0.array = gen.ArrayOf(out.v1_0.field->type(), kNumRows);
+    out.v0.array = gen.ArrayOf(out.v0.field->type(), kNumRows);
+    // Make struct fields from leaf arrays (we use the custom bitmaps here)
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1.array,
+        StructArray::Make({out.v1_1_0.array, out.v1_1_1.array},
+                          {out.v1_1_0.field, out.v1_1_1.field}, bitmap1_1));
+    ARROW_ASSIGN_OR_RAISE(out.v1.array,
+                          StructArray::Make({out.v1_0.array, out.v1_1.array},
+                                            {out.v1_0.field, out.v1_1.field}, bitmap1));
+
+    // Not used to create the test input, but pre-compute flattened versions of nested
+    // arrays for comparisons in the GetFlattened tests.
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(1));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(1));
+    // Sanity check
+    ARROW_CHECK(!out.v1_0_flat.array->Equals(out.v1_0.array));
+    ARROW_CHECK(!out.v1_1_flat.array->Equals(out.v1_1.array));
+    ARROW_CHECK(!out.v1_1_0_flat.array->Equals(out.v1_1_0.array));
+    ARROW_CHECK(!out.v1_1_1_flat.array->Equals(out.v1_1_1.array));
+
+    // Finalize the input Array
+    ARROW_ASSIGN_OR_RAISE(out.array, StructArray::Make({out.v0.array, out.v1.array},
+                                                       {out.v0.field, out.v1.field}));
+    ARROW_RETURN_NOT_OK(out.array->ValidateFull());
+    // Finalize the input RecordBatch
+    ARROW_ASSIGN_OR_RAISE(out.record_batch, RecordBatch::FromStructArray(out.array));
+    ARROW_RETURN_NOT_OK(out.record_batch->ValidateFull());
+    // Finalize the input ChunkedArray
+    out.chunked_array = SliceToChunkedArray(*out.array, out.num_chunks);
+    ARROW_RETURN_NOT_OK(out.chunked_array->ValidateFull());
+
+    // For each expected child array, create a chunked equivalent (we use a different
+    // chunk layout for each top-level column to make the Table test more interesting)
+    for (OutputValues* v :
+         {&out.v0, &out.v1, &out.v1_0, &out.v1_1, &out.v1_1_0, &out.v1_1_1,
+          &out.v1_0_flat, &out.v1_1_flat, &out.v1_1_0_flat, &out.v1_1_1_flat}) {
+      v->chunked_array =
+          SliceToChunkedArray(*v->array, out.num_column_chunks[v->path[0]]);
+    }
+    // Finalize the input Table
+    out.table =
+        Table::Make(out.schema, {out.v0.chunked_array, out.v1.chunked_array}, kNumRows);
+    ARROW_RETURN_NOT_OK(out.table->ValidateFull());
+
+    return std::move(out);
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(*table));
-}
-
-TEST(TestFieldPath, GetForChunkedArray) {
-  using testing::HasSubstr;
-
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto type = struct_({f0, f1, f3});
-
-  auto column0 = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  auto column1 = ChunkedArrayFromJSON(f1->type(), {"[3,2,1,0]"});
-  auto column2_1 =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2},{"b":1},{"b":0}])"});
-  auto chunked_array = ChunkedArrayFromJSON(
-      type,
-      {
-          R"([{"a":0,"b":3,"d":{"a":0,"c":{"b":3}}}])",
-          R"([{"a":1,"b":2,"d":{"a":1,"c":{"b":2}}},{"a":2,"b":1,"d":{"a":2,"c":{"b":1}}}])",
-          R"([{"a":3,"b":0,"d":{"a":3,"c":{"b":0}}}])",
-      });
-  ASSERT_OK(chunked_array->ValidateFull());
-
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1, FieldPath({2, 1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1_0, FieldPath({2, 1, 0}).Get(*chunked_array));
-
-  for (const auto& v : {v0, v1, v2_0, v2_1, v2_1_0}) {
-    EXPECT_EQ(v->num_chunks(), chunked_array->num_chunks());
+
+  static std::shared_ptr<ChunkedArray> SliceToChunkedArray(const Array& array,
+                                                           int num_chunks) {
+    ARROW_CHECK(num_chunks > 0 && array.length() >= num_chunks);
+    ArrayVector chunks;
+    chunks.reserve(num_chunks);
+    for (int64_t inc = array.length() / num_chunks, beg = 0,
+                 end = inc + array.length() % num_chunks;
+         end <= array.length(); beg = end, end += inc) {
+      chunks.push_back(array.SliceSafe(beg, end - beg).ValueOrDie());
+    }
+    ARROW_CHECK_EQ(static_cast<int>(chunks.size()), num_chunks);
+    return ChunkedArray::Make(std::move(chunks)).ValueOrDie();
   }
+};
+
+template <>
+auto&& FieldPathTestCase::GetInput<Schema>() const {
+  return this->schema;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<DataType>() const {
+  return this->type;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<Array>() const {
+  return this->array;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<ArrayData>() const {
+  return this->array->data();
+}
+template <>
+auto&& FieldPathTestCase::GetInput<ChunkedArray>() const {
+  return this->chunked_array;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<RecordBatch>() const {
+  return this->record_batch;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<Table>() const {
+  return this->table;
+}
 
-  EXPECT_TRUE(column0->Equals(v0));
-  EXPECT_TRUE(column0->Equals(v2_0));
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<Field>() const {
+  return this->field;
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<Array>() const {
+  return this->array;
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<ArrayData>() const {
+  return this->array->data();
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<ChunkedArray>() const {
+  return this->chunked_array;
+}
 
-  EXPECT_TRUE(column1->Equals(v1));
-  EXPECT_TRUE(column1->Equals(v2_1_0));
-  EXPECT_FALSE(column1->Equals(v2_1));
+class FieldPathTestFixture : public ::testing::Test {
+ public:
+  FieldPathTestFixture() : case_(FieldPathTestCase::Instance()) {}
 
-  EXPECT_TRUE(column2_1->Equals(v2_1));
+ protected:
+  template <typename T>
+  using OutputType = typename internal::FieldPathGetType<T>::element_type;
 
-  EXPECT_RAISES_WITH_MESSAGE_THAT(NotImplemented,
-                                  HasSubstr("Get child data of non-struct chunked array"),
-                                  FieldPath({0}).Get(*column0));
-}
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Field>& expected,
+                          const std::shared_ptr<Field>& actual) const {
+    AssertFieldEqual(*expected, *actual);
+  }
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Array>& expected,
+                          const std::shared_ptr<Array>& actual) const {
+    AssertArraysEqual(*expected, *actual);
+  }
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<ChunkedArray>& expected,
+                          const std::shared_ptr<ChunkedArray>& actual) const {
+    if constexpr (std::is_same_v<I, ChunkedArray>) {
+      EXPECT_EQ(case_->chunked_array->num_chunks(), actual->num_chunks());
+    } else {
+      EXPECT_EQ(expected->num_chunks(), actual->num_chunks());
+    }
+    AssertChunkedEquivalent(*expected, *actual);
+  }
+
+  const FieldPathTestCase* case_;
+};
+
+class TestFieldPath : public FieldPathTestFixture {
+ protected:
+  template <typename I, bool Flattened = false>
+  void TestGetWithInvalidIndex() const {
+    const auto& input = case_->GetInput<I>();
+    for (const auto& path :
+         {FieldPath({2, 1, 0}), FieldPath({1, 2, 0}), FieldPath{1, 1, 2}}) {
+      EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError,
+                                      ::testing::HasSubstr("index out of range"),
+                                      internal::GetChild<Flattened>(*input, path));

Review Comment:
   Since the `IndexError` has a more detailed error message, is it possible to test it precisely at least for a hard-coded case?



##########
cpp/src/arrow/type_test.cc:
##########
@@ -693,25 +851,6 @@ TEST(TestFieldRef, DotPathRoundTrip) {
   check_roundtrip(FieldRef("foo", 1, FieldRef("bar", 2, 3), FieldRef()));
 }
 
-TEST(TestFieldPath, Nested) {
-  auto f0 = field("alpha", int32());
-  auto f1_0 = field("alpha", int32());
-  auto f1 = field("beta", struct_({f1_0}));
-  auto f2_0 = field("alpha", int32());
-  auto f2_1_0 = field("alpha", int32());
-  auto f2_1_1 = field("alpha", int32());
-  auto f2_1 = field("gamma", struct_({f2_1_0, f2_1_1}));
-  auto f2 = field("beta", struct_({f2_0, f2_1}));
-  Schema s({f0, f1, f2});
-
-  // retrieving fields with nested indices
-  EXPECT_EQ(FieldPath({0}).Get(s), f0);
-  EXPECT_EQ(FieldPath({1, 0}).Get(s), f1_0);
-  EXPECT_EQ(FieldPath({2, 0}).Get(s), f2_0);
-  EXPECT_EQ(FieldPath({2, 1, 0}).Get(s), f2_1_0);
-  EXPECT_EQ(FieldPath({2, 1, 1}).Get(s), f2_1_1);
-}
-
 TEST(TestFieldRef, Nested) {

Review Comment:
   It seems quite a lot of effort was expended to test `FieldPath` comprehensively (thanks a lot for that!), but the additional `FieldRef` methods do not seem to be exercised here? (am I mistaken?)



##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,344 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {
+  struct OutputValues {
+    explicit OutputValues(std::vector<int> indices = {})
+        : path(FieldPath(std::move(indices))) {}
 
-  // retrieving a field with single-element FieldPath is equivalent to Schema::field
-  for (int index = 0; index < s.num_fields(); ++index) {
-    ASSERT_OK_AND_EQ(s.field(index), FieldPath({index}).Get(s));
+    template <typename T>
+    auto&& Get() const;
+
+    FieldPath path;
+    std::shared_ptr<Field> field;
+    std::shared_ptr<Array> array;
+    std::shared_ptr<ChunkedArray> chunked_array;
+  };
+
+  static constexpr int kNumColumns = 2;
+  static constexpr int kNumRows = 100;
+  static constexpr int kRandomSeed = 0xbeef;
+
+  // Input for the FieldPath::Get functions in multiple forms
+  std::shared_ptr<Schema> schema;
+  std::shared_ptr<DataType> type;
+  std::shared_ptr<Array> array;
+  std::shared_ptr<RecordBatch> record_batch;
+  std::shared_ptr<ChunkedArray> chunked_array;
+  std::shared_ptr<Table> table;
+
+  template <typename T>
+  auto&& GetInput() const;
+
+  // Number of chunks for each column in the input Table
+  const std::array<int, kNumColumns> num_column_chunks = {15, 20};
+  // Number of chunks in the input ChunkedArray
+  const int num_chunks = 15;
+
+  // Expected outputs for each child;
+  OutputValues v0{{0}}, v1{{1}};
+  OutputValues v1_0{{1, 0}}, v1_1{{1, 1}};
+  OutputValues v1_1_0{{1, 1, 0}}, v1_1_1{{1, 1, 1}};
+  // Expected outputs for nested children with null flattening applied
+  OutputValues v1_0_flat{{1, 0}}, v1_1_flat{{1, 1}};
+  OutputValues v1_1_0_flat{{1, 1, 0}}, v1_1_1_flat{{1, 1, 1}};
+
+  static const FieldPathTestCase* Instance() {
+    static const auto maybe_instance = Make();
+    return &maybe_instance.ValueOrDie();
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
-                                  testing::HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(s));
-  EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, testing::HasSubstr("index out of range"),
-                                  FieldPath({s.num_fields() * 2}).Get(s));
-}
-
-TEST(TestFieldPath, GetForTable) {
-  using testing::HasSubstr;
-
-  constexpr int kNumRows = 4;
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto table_schema = schema({f0, f1, f2, f3});
-
-  // Each column has a different chunking
-  ChunkedArrayVector columns(4);
-  columns[0] = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  columns[1] = ChunkedArrayFromJSON(f1->type(), {"[3,2,1]", "[0]"});
-  columns[2] =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2}])", R"([{"b":1},{"b":0}])"});
-  columns[3] = ChunkedArrayFromJSON(
-      f3->type(), {R"([{"a":0,"c":{"b":3}},{"a":1,"c":{"b":2}}])",
-                   R"([{"a":2,"c":{"b":1}}])", R"([{"a":3,"c":{"b":0}}])"});
-  auto table = Table::Make(table_schema, columns, kNumRows);
-  ASSERT_OK(table->ValidateFull());
 
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2, FieldPath({2}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3, FieldPath({3}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_0, FieldPath({3, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1, FieldPath({3, 1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1_0, FieldPath({3, 1, 0}).Get(*table));
-
-  EXPECT_EQ(v0->num_chunks(), columns[0]->num_chunks());
-  EXPECT_EQ(v1->num_chunks(), columns[1]->num_chunks());
-  EXPECT_EQ(v2->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v2_0->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v3->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_0->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1_0->num_chunks(), columns[3]->num_chunks());
-
-  EXPECT_TRUE(columns[0]->Equals(v0));
-  EXPECT_TRUE(columns[0]->Equals(v3_0));
-
-  EXPECT_TRUE(columns[1]->Equals(v1));
-  EXPECT_TRUE(columns[1]->Equals(v2_0));
-  EXPECT_TRUE(columns[1]->Equals(v3_1_0));
-
-  EXPECT_TRUE(columns[2]->Equals(v2));
-  EXPECT_TRUE(columns[2]->Equals(v3_1));
-
-  EXPECT_TRUE(columns[3]->Equals(v3));
-
-  for (const auto& path :
-       {FieldPath({4, 1, 0}), FieldPath({3, 2, 0}), FieldPath{3, 1, 1}}) {
-    EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, HasSubstr("index out of range"),
-                                    path.Get(*table));
+  static Result<FieldPathTestCase> Make() {
+    // Generate test input based on a single schema. First by creating a StructArray,
+    // then deriving the other input types (ChunkedArray, RecordBatch, Table, etc) from
+    // it. We also compute the expected outputs for each child individually (for each
+    // output type).
+    FieldPathTestCase out;
+    random::RandomArrayGenerator gen(kRandomSeed);
+
+    // Define child fields and input schema
+    out.v1_1_1.field = field("b", boolean());
+    out.v1_1_0.field = field("f", float32());
+    out.v1_1.field = field("s1", struct_({out.v1_1_0.field, out.v1_1_1.field}));
+    out.v1_0.field = field("i", int32());
+    out.v1.field = field("s0", struct_({out.v1_0.field, out.v1_1.field}));
+    out.v0.field = field("u", utf8());
+    out.schema = arrow::schema({out.v0.field, out.v1.field});
+    out.type = struct_(out.schema->fields());
+
+    // Create null bitmaps for the struct fields independent of its childrens'
+    // bitmaps. For FieldPath::GetFlattened, parent/child bitmaps should be combined
+    // - for FieldPath::Get, higher-level nulls are ignored.
+    auto bitmap1_1 = gen.NullBitmap(kNumRows, 0.15);
+    auto bitmap1 = gen.NullBitmap(kNumRows, 0.30);
+
+    // Generate raw leaf arrays
+    out.v1_1_1.array = gen.ArrayOf(out.v1_1_1.field->type(), kNumRows);
+    out.v1_1_0.array = gen.ArrayOf(out.v1_1_0.field->type(), kNumRows);
+    out.v1_0.array = gen.ArrayOf(out.v1_0.field->type(), kNumRows);
+    out.v0.array = gen.ArrayOf(out.v0.field->type(), kNumRows);
+    // Make struct fields from leaf arrays (we use the custom bitmaps here)
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1.array,
+        StructArray::Make({out.v1_1_0.array, out.v1_1_1.array},
+                          {out.v1_1_0.field, out.v1_1_1.field}, bitmap1_1));
+    ARROW_ASSIGN_OR_RAISE(out.v1.array,
+                          StructArray::Make({out.v1_0.array, out.v1_1.array},
+                                            {out.v1_0.field, out.v1_1.field}, bitmap1));
+
+    // Not used to create the test input, but pre-compute flattened versions of nested
+    // arrays for comparisons in the GetFlattened tests.
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(1));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(1));
+    // Sanity check
+    ARROW_CHECK(!out.v1_0_flat.array->Equals(out.v1_0.array));
+    ARROW_CHECK(!out.v1_1_flat.array->Equals(out.v1_1.array));
+    ARROW_CHECK(!out.v1_1_0_flat.array->Equals(out.v1_1_0.array));
+    ARROW_CHECK(!out.v1_1_1_flat.array->Equals(out.v1_1_1.array));
+
+    // Finalize the input Array
+    ARROW_ASSIGN_OR_RAISE(out.array, StructArray::Make({out.v0.array, out.v1.array},
+                                                       {out.v0.field, out.v1.field}));
+    ARROW_RETURN_NOT_OK(out.array->ValidateFull());
+    // Finalize the input RecordBatch
+    ARROW_ASSIGN_OR_RAISE(out.record_batch, RecordBatch::FromStructArray(out.array));
+    ARROW_RETURN_NOT_OK(out.record_batch->ValidateFull());
+    // Finalize the input ChunkedArray
+    out.chunked_array = SliceToChunkedArray(*out.array, out.num_chunks);
+    ARROW_RETURN_NOT_OK(out.chunked_array->ValidateFull());
+
+    // For each expected child array, create a chunked equivalent (we use a different
+    // chunk layout for each top-level column to make the Table test more interesting)
+    for (OutputValues* v :
+         {&out.v0, &out.v1, &out.v1_0, &out.v1_1, &out.v1_1_0, &out.v1_1_1,
+          &out.v1_0_flat, &out.v1_1_flat, &out.v1_1_0_flat, &out.v1_1_1_flat}) {
+      v->chunked_array =
+          SliceToChunkedArray(*v->array, out.num_column_chunks[v->path[0]]);
+    }
+    // Finalize the input Table
+    out.table =
+        Table::Make(out.schema, {out.v0.chunked_array, out.v1.chunked_array}, kNumRows);
+    ARROW_RETURN_NOT_OK(out.table->ValidateFull());
+
+    return std::move(out);
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(*table));
-}
-
-TEST(TestFieldPath, GetForChunkedArray) {
-  using testing::HasSubstr;
-
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto type = struct_({f0, f1, f3});
-
-  auto column0 = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  auto column1 = ChunkedArrayFromJSON(f1->type(), {"[3,2,1,0]"});
-  auto column2_1 =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2},{"b":1},{"b":0}])"});
-  auto chunked_array = ChunkedArrayFromJSON(
-      type,
-      {
-          R"([{"a":0,"b":3,"d":{"a":0,"c":{"b":3}}}])",
-          R"([{"a":1,"b":2,"d":{"a":1,"c":{"b":2}}},{"a":2,"b":1,"d":{"a":2,"c":{"b":1}}}])",
-          R"([{"a":3,"b":0,"d":{"a":3,"c":{"b":0}}}])",
-      });
-  ASSERT_OK(chunked_array->ValidateFull());
-
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1, FieldPath({2, 1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1_0, FieldPath({2, 1, 0}).Get(*chunked_array));
-
-  for (const auto& v : {v0, v1, v2_0, v2_1, v2_1_0}) {
-    EXPECT_EQ(v->num_chunks(), chunked_array->num_chunks());
+
+  static std::shared_ptr<ChunkedArray> SliceToChunkedArray(const Array& array,
+                                                           int num_chunks) {
+    ARROW_CHECK(num_chunks > 0 && array.length() >= num_chunks);
+    ArrayVector chunks;
+    chunks.reserve(num_chunks);
+    for (int64_t inc = array.length() / num_chunks, beg = 0,
+                 end = inc + array.length() % num_chunks;
+         end <= array.length(); beg = end, end += inc) {
+      chunks.push_back(array.SliceSafe(beg, end - beg).ValueOrDie());
+    }
+    ARROW_CHECK_EQ(static_cast<int>(chunks.size()), num_chunks);
+    return ChunkedArray::Make(std::move(chunks)).ValueOrDie();
   }
+};
+
+template <>
+auto&& FieldPathTestCase::GetInput<Schema>() const {
+  return this->schema;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<DataType>() const {
+  return this->type;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<Array>() const {
+  return this->array;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<ArrayData>() const {
+  return this->array->data();
+}
+template <>
+auto&& FieldPathTestCase::GetInput<ChunkedArray>() const {
+  return this->chunked_array;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<RecordBatch>() const {
+  return this->record_batch;
+}
+template <>
+auto&& FieldPathTestCase::GetInput<Table>() const {
+  return this->table;
+}
 
-  EXPECT_TRUE(column0->Equals(v0));
-  EXPECT_TRUE(column0->Equals(v2_0));
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<Field>() const {
+  return this->field;
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<Array>() const {
+  return this->array;
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<ArrayData>() const {
+  return this->array->data();
+}
+template <>
+auto&& FieldPathTestCase::OutputValues::Get<ChunkedArray>() const {
+  return this->chunked_array;
+}
 
-  EXPECT_TRUE(column1->Equals(v1));
-  EXPECT_TRUE(column1->Equals(v2_1_0));
-  EXPECT_FALSE(column1->Equals(v2_1));
+class FieldPathTestFixture : public ::testing::Test {
+ public:
+  FieldPathTestFixture() : case_(FieldPathTestCase::Instance()) {}
 
-  EXPECT_TRUE(column2_1->Equals(v2_1));
+ protected:
+  template <typename T>
+  using OutputType = typename internal::FieldPathGetType<T>::element_type;
 
-  EXPECT_RAISES_WITH_MESSAGE_THAT(NotImplemented,
-                                  HasSubstr("Get child data of non-struct chunked array"),
-                                  FieldPath({0}).Get(*column0));
-}
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Field>& expected,
+                          const std::shared_ptr<Field>& actual) const {
+    AssertFieldEqual(*expected, *actual);
+  }
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<Array>& expected,
+                          const std::shared_ptr<Array>& actual) const {
+    AssertArraysEqual(*expected, *actual);
+  }
+  template <typename I>
+  void AssertOutputsEqual(const std::shared_ptr<ChunkedArray>& expected,
+                          const std::shared_ptr<ChunkedArray>& actual) const {
+    if constexpr (std::is_same_v<I, ChunkedArray>) {

Review Comment:
   Can you comment to explain this to the reader?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1542314410

   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1194131632


##########
cpp/src/arrow/type_test.cc:
##########
@@ -693,25 +851,6 @@ TEST(TestFieldRef, DotPathRoundTrip) {
   check_roundtrip(FieldRef("foo", 1, FieldRef("bar", 2, 3), FieldRef()));
 }
 
-TEST(TestFieldPath, Nested) {
-  auto f0 = field("alpha", int32());
-  auto f1_0 = field("alpha", int32());
-  auto f1 = field("beta", struct_({f1_0}));
-  auto f2_0 = field("alpha", int32());
-  auto f2_1_0 = field("alpha", int32());
-  auto f2_1_1 = field("alpha", int32());
-  auto f2_1 = field("gamma", struct_({f2_1_0, f2_1_1}));
-  auto f2 = field("beta", struct_({f2_0, f2_1}));
-  Schema s({f0, f1, f2});
-
-  // retrieving fields with nested indices
-  EXPECT_EQ(FieldPath({0}).Get(s), f0);
-  EXPECT_EQ(FieldPath({1, 0}).Get(s), f1_0);
-  EXPECT_EQ(FieldPath({2, 0}).Get(s), f2_0);
-  EXPECT_EQ(FieldPath({2, 1, 0}).Get(s), f2_1_0);
-  EXPECT_EQ(FieldPath({2, 1, 1}).Get(s), f2_1_1);
-}
-
 TEST(TestFieldRef, Nested) {

Review Comment:
   Agreed. I wrote the tests before adding the `FieldRef` methods and then neglected to come back to them... Fortunately, it should be possible to use the same test case (for the most part).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196711632


##########
cpp/src/arrow/type.h:
##########
@@ -1886,6 +1902,15 @@ class ARROW_EXPORT FieldRef : public util::EqualityComparable<FieldRef> {
     }
     return out;
   }
+  template <typename T>
+  std::vector<GetType<T>> GetAllFlattened(const T& root,
+                                          MemoryPool* pool = NULLPTR) const {
+    std::vector<GetType<T>> out;
+    for (const auto& match : FindAll(root)) {
+      out.push_back(match.GetFlattened(root, pool).ValueOrDie());

Review Comment:
   In that case, I'm a little confused as to why the non-flattened variants don't forward those errors either - since the standard `FieldPath::Get` methods can also fail (which was true prior to this PR).
   
   I'm mostly referring to `GetOne` and `GetOneAndNone` here, as they already return a `Result`. Regardless, I'll propagate those errors for the new methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1533247021

   @westonpace I feel you're probably most qualified to look at this given your comments on the original issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1511909577

   :warning: GitHub issue #14946 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1513787621

   Probably worth mentioning that this (indirectly) addresses most of https://github.com/apache/arrow/issues/34830 as well - at least on the `FieldPath` side of things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175578267


##########
cpp/src/arrow/type.cc:
##########
@@ -1129,18 +1129,20 @@ class ChunkedArrayData : public ChunkedColumn {
 // Return a vector of ChunkedColumns - one for each struct field.
 // Unlike ChunkedArray::Flatten, this is zero-copy and doesn't merge parent/child
 // validity bitmaps.
-ChunkedColumnVector ChunkedColumn::Flatten() const {
+ChunkedColumnVector ChunkedColumn::FlattenZeroCopy() const {
   DCHECK_EQ(type()->id(), Type::STRUCT);
 
   ChunkedColumnVector columns(type()->num_fields());
   for (int column_idx = 0; column_idx < type()->num_fields(); ++column_idx) {
     const auto& child_type = type()->field(column_idx)->type();
     ArrayDataVector chunks(num_chunks());
     for (int chunk_idx = 0; chunk_idx < num_chunks(); ++chunk_idx) {
-      const auto& child_data = chunk(chunk_idx)->child_data;
-      DCHECK_EQ(columns.size(), child_data.size());
-      DCHECK(child_type->Equals(child_data[column_idx]->type));
-      chunks[chunk_idx] = child_data[column_idx];
+      const auto& parent = chunk(chunk_idx);
+      const auto& children = parent->child_data;
+      DCHECK_EQ(columns.size(), children.size());
+      auto child = children[column_idx]->Slice(parent->offset, parent->length);

Review Comment:
   This is a bug-fix, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196520800


##########
cpp/src/arrow/type.h:
##########
@@ -1894,6 +1919,11 @@ class ARROW_EXPORT FieldRef : public util::EqualityComparable<FieldRef> {
     ARROW_ASSIGN_OR_RAISE(auto match, FindOne(root));
     return match.Get(root).ValueOrDie();
   }
+  template <typename T>
+  Result<GetType<T>> GetOneFlattened(const T& root, MemoryPool* pool = NULLPTR) const {
+    ARROW_ASSIGN_OR_RAISE(auto match, FindOne(root));
+    return match.GetFlattened(root, pool).ValueOrDie();

Review Comment:
   Same here (and this method is already returning a `Result`!).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1569124096

   Benchmark runs are scheduled for baseline = 2216a0a483b6e5f9633bbee85c5a623360ca8a94 and contender = f3500f65c0b15e285cba2b1822385248b3424f84. f3500f65c0b15e285cba2b1822385248b3424f84 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/91139e85e6464be18b0a24bac5525570...d1c4c1f03c294ee4ba6354fc910f5618/)
   [Finished :arrow_down:0.45% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ff5e21efe68842d6b2268e5ec1eebb79...63878470fbfa44ea83079d73af2c6549/)
   [Finished :arrow_down:2.94% :arrow_up:1.63%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9cf73ac83f0a44179e6538b2c1c7babd...3d76cb5ffb8849bf8c3ea9b32d08b3b7/)
   [Finished :arrow_down:1.82% :arrow_up:0.73%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c6981bebc1e345819fbbd6e1a07ee06d...385dcbb492f44244b6edae507d6d8018/)
   Buildkite builds:
   [Finished] [`f3500f65` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2915)
   [Finished] [`f3500f65` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2951)
   [Finished] [`f3500f65` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2916)
   [Finished] [`f3500f65` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2941)
   [Finished] [`2216a0a4` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2914)
   [Finished] [`2216a0a4` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2950)
   [Finished] [`2216a0a4` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2915)
   [Finished] [`2216a0a4` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2940)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1557364648

   CI failures are unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196519921


##########
cpp/src/arrow/type.h:
##########
@@ -1886,6 +1902,15 @@ class ARROW_EXPORT FieldRef : public util::EqualityComparable<FieldRef> {
     }
     return out;
   }
+  template <typename T>
+  std::vector<GetType<T>> GetAllFlattened(const T& root,
+                                          MemoryPool* pool = NULLPTR) const {
+    std::vector<GetType<T>> out;
+    for (const auto& match : FindAll(root)) {
+      out.push_back(match.GetFlattened(root, pool).ValueOrDie());

Review Comment:
   Uh. Unlike `GetAll`, this can fail for various reasons such as failure to allocate enough memory. In that case, we'd probably want to return an error instead of dying out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175657421


##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,41 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's
+  Result<std::shared_ptr<Array>> GetFlattened(const Array& array) const;
+  Result<std::shared_ptr<ArrayData>> GetFlattened(const ArrayData& data) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(
+      const ChunkedArray& chunked_array) const;
+  Result<std::shared_ptr<Array>> GetFlattened(const RecordBatch& batch) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const Table& table) const;
+
  private:
   std::vector<int> indices_;
 };
 
+namespace internal {
+
+template <typename T>
+using FieldPathGetType =
+    decltype(std::declval<FieldPath>().Get(std::declval<T>()).ValueOrDie());

Review Comment:
   Only now I noticed you've just moved this code here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #35197:
URL: https://github.com/apache/arrow/pull/35197#issuecomment-1514935196

   @zeroshade @felipecrv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1194117861


##########
cpp/src/arrow/type.h:
##########
@@ -1698,10 +1698,45 @@ class ARROW_EXPORT FieldPath {
   /// \brief Retrieve the referenced child from a ChunkedArray
   Result<std::shared_ptr<ChunkedArray>> Get(const ChunkedArray& chunked_array) const;
 
+  /// \brief Retrieve the referenced child/column from an Array, ArrayData, ChunkedArray,
+  /// RecordBatch, or Table
+  ///
+  /// Unlike `FieldPath::Get`, these variants are not zero-copy and the retrieved child's
+  /// null bitmap is ANDed with its parent's
+  Result<std::shared_ptr<Array>> GetFlattened(const Array& array,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ArrayData>> GetFlattened(const ArrayData& data,
+                                                  MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const ChunkedArray& chunked_array,
+                                                     MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<Array>> GetFlattened(const RecordBatch& batch,
+                                              MemoryPool* pool = NULLPTR) const;
+  Result<std::shared_ptr<ChunkedArray>> GetFlattened(const Table& table,
+                                                     MemoryPool* pool = NULLPTR) const;
+
  private:
   std::vector<int> indices_;
 };
 
+namespace internal {
+
+template <typename T>
+using FieldPathGetType =
+    decltype(std::declval<FieldPath>().Get(std::declval<T>()).ValueOrDie());
+
+template <bool Flattened, typename T>
+std::enable_if_t<!Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* = NULLPTR) {
+  return path.Get(root);
+}
+template <bool Flattened, typename T>
+std::enable_if_t<Flattened, Result<FieldPathGetType<T>>> GetChild(
+    const T& root, const FieldPath& path, MemoryPool* pool = NULLPTR) {
+  return path.GetFlattened(root, pool);
+}

Review Comment:
   ~Yeah, unfortunately `if constexpr` wouldn't work here since `GetFlattened` doesn't have overloads for `Schema`, `Field`, etc.~ Disregard that... I was mistaken.
   
   But I'm fine with just duplicating the definitions as well (in that case, the helpers probably wouldn't be too useful outside of the tests so no need to expose them).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1194154670


##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,344 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {
+  struct OutputValues {
+    explicit OutputValues(std::vector<int> indices = {})
+        : path(FieldPath(std::move(indices))) {}
 
-  // retrieving a field with single-element FieldPath is equivalent to Schema::field
-  for (int index = 0; index < s.num_fields(); ++index) {
-    ASSERT_OK_AND_EQ(s.field(index), FieldPath({index}).Get(s));
+    template <typename T>
+    auto&& Get() const;
+
+    FieldPath path;
+    std::shared_ptr<Field> field;
+    std::shared_ptr<Array> array;
+    std::shared_ptr<ChunkedArray> chunked_array;
+  };
+
+  static constexpr int kNumColumns = 2;
+  static constexpr int kNumRows = 100;
+  static constexpr int kRandomSeed = 0xbeef;
+
+  // Input for the FieldPath::Get functions in multiple forms
+  std::shared_ptr<Schema> schema;
+  std::shared_ptr<DataType> type;
+  std::shared_ptr<Array> array;
+  std::shared_ptr<RecordBatch> record_batch;
+  std::shared_ptr<ChunkedArray> chunked_array;
+  std::shared_ptr<Table> table;
+
+  template <typename T>
+  auto&& GetInput() const;
+
+  // Number of chunks for each column in the input Table
+  const std::array<int, kNumColumns> num_column_chunks = {15, 20};
+  // Number of chunks in the input ChunkedArray
+  const int num_chunks = 15;
+
+  // Expected outputs for each child;
+  OutputValues v0{{0}}, v1{{1}};
+  OutputValues v1_0{{1, 0}}, v1_1{{1, 1}};
+  OutputValues v1_1_0{{1, 1, 0}}, v1_1_1{{1, 1, 1}};
+  // Expected outputs for nested children with null flattening applied
+  OutputValues v1_0_flat{{1, 0}}, v1_1_flat{{1, 1}};
+  OutputValues v1_1_0_flat{{1, 1, 0}}, v1_1_1_flat{{1, 1, 1}};
+
+  static const FieldPathTestCase* Instance() {
+    static const auto maybe_instance = Make();
+    return &maybe_instance.ValueOrDie();
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid,
-                                  testing::HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(s));
-  EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, testing::HasSubstr("index out of range"),
-                                  FieldPath({s.num_fields() * 2}).Get(s));
-}
-
-TEST(TestFieldPath, GetForTable) {
-  using testing::HasSubstr;
-
-  constexpr int kNumRows = 4;
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto table_schema = schema({f0, f1, f2, f3});
-
-  // Each column has a different chunking
-  ChunkedArrayVector columns(4);
-  columns[0] = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  columns[1] = ChunkedArrayFromJSON(f1->type(), {"[3,2,1]", "[0]"});
-  columns[2] =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2}])", R"([{"b":1},{"b":0}])"});
-  columns[3] = ChunkedArrayFromJSON(
-      f3->type(), {R"([{"a":0,"c":{"b":3}},{"a":1,"c":{"b":2}}])",
-                   R"([{"a":2,"c":{"b":1}}])", R"([{"a":3,"c":{"b":0}}])"});
-  auto table = Table::Make(table_schema, columns, kNumRows);
-  ASSERT_OK(table->ValidateFull());
 
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2, FieldPath({2}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3, FieldPath({3}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_0, FieldPath({3, 0}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1, FieldPath({3, 1}).Get(*table));
-  ASSERT_OK_AND_ASSIGN(auto v3_1_0, FieldPath({3, 1, 0}).Get(*table));
-
-  EXPECT_EQ(v0->num_chunks(), columns[0]->num_chunks());
-  EXPECT_EQ(v1->num_chunks(), columns[1]->num_chunks());
-  EXPECT_EQ(v2->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v2_0->num_chunks(), columns[2]->num_chunks());
-  EXPECT_EQ(v3->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_0->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1->num_chunks(), columns[3]->num_chunks());
-  EXPECT_EQ(v3_1_0->num_chunks(), columns[3]->num_chunks());
-
-  EXPECT_TRUE(columns[0]->Equals(v0));
-  EXPECT_TRUE(columns[0]->Equals(v3_0));
-
-  EXPECT_TRUE(columns[1]->Equals(v1));
-  EXPECT_TRUE(columns[1]->Equals(v2_0));
-  EXPECT_TRUE(columns[1]->Equals(v3_1_0));
-
-  EXPECT_TRUE(columns[2]->Equals(v2));
-  EXPECT_TRUE(columns[2]->Equals(v3_1));
-
-  EXPECT_TRUE(columns[3]->Equals(v3));
-
-  for (const auto& path :
-       {FieldPath({4, 1, 0}), FieldPath({3, 2, 0}), FieldPath{3, 1, 1}}) {
-    EXPECT_RAISES_WITH_MESSAGE_THAT(IndexError, HasSubstr("index out of range"),
-                                    path.Get(*table));
+  static Result<FieldPathTestCase> Make() {
+    // Generate test input based on a single schema. First by creating a StructArray,
+    // then deriving the other input types (ChunkedArray, RecordBatch, Table, etc) from
+    // it. We also compute the expected outputs for each child individually (for each
+    // output type).
+    FieldPathTestCase out;
+    random::RandomArrayGenerator gen(kRandomSeed);
+
+    // Define child fields and input schema
+    out.v1_1_1.field = field("b", boolean());
+    out.v1_1_0.field = field("f", float32());
+    out.v1_1.field = field("s1", struct_({out.v1_1_0.field, out.v1_1_1.field}));
+    out.v1_0.field = field("i", int32());
+    out.v1.field = field("s0", struct_({out.v1_0.field, out.v1_1.field}));
+    out.v0.field = field("u", utf8());
+    out.schema = arrow::schema({out.v0.field, out.v1.field});
+    out.type = struct_(out.schema->fields());
+
+    // Create null bitmaps for the struct fields independent of its childrens'
+    // bitmaps. For FieldPath::GetFlattened, parent/child bitmaps should be combined
+    // - for FieldPath::Get, higher-level nulls are ignored.
+    auto bitmap1_1 = gen.NullBitmap(kNumRows, 0.15);
+    auto bitmap1 = gen.NullBitmap(kNumRows, 0.30);
+
+    // Generate raw leaf arrays
+    out.v1_1_1.array = gen.ArrayOf(out.v1_1_1.field->type(), kNumRows);
+    out.v1_1_0.array = gen.ArrayOf(out.v1_1_0.field->type(), kNumRows);
+    out.v1_0.array = gen.ArrayOf(out.v1_0.field->type(), kNumRows);
+    out.v0.array = gen.ArrayOf(out.v0.field->type(), kNumRows);
+    // Make struct fields from leaf arrays (we use the custom bitmaps here)
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1.array,
+        StructArray::Make({out.v1_1_0.array, out.v1_1_1.array},
+                          {out.v1_1_0.field, out.v1_1_1.field}, bitmap1_1));
+    ARROW_ASSIGN_OR_RAISE(out.v1.array,
+                          StructArray::Make({out.v1_0.array, out.v1_1.array},
+                                            {out.v1_0.field, out.v1_1.field}, bitmap1));
+
+    // Not used to create the test input, but pre-compute flattened versions of nested
+    // arrays for comparisons in the GetFlattened tests.
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1.array)->GetFlattenedField(1));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_0_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(0));
+    ARROW_ASSIGN_OR_RAISE(
+        out.v1_1_1_flat.array,
+        checked_pointer_cast<StructArray>(out.v1_1_flat.array)->GetFlattenedField(1));
+    // Sanity check
+    ARROW_CHECK(!out.v1_0_flat.array->Equals(out.v1_0.array));
+    ARROW_CHECK(!out.v1_1_flat.array->Equals(out.v1_1.array));
+    ARROW_CHECK(!out.v1_1_0_flat.array->Equals(out.v1_1_0.array));
+    ARROW_CHECK(!out.v1_1_1_flat.array->Equals(out.v1_1_1.array));
+
+    // Finalize the input Array
+    ARROW_ASSIGN_OR_RAISE(out.array, StructArray::Make({out.v0.array, out.v1.array},
+                                                       {out.v0.field, out.v1.field}));
+    ARROW_RETURN_NOT_OK(out.array->ValidateFull());
+    // Finalize the input RecordBatch
+    ARROW_ASSIGN_OR_RAISE(out.record_batch, RecordBatch::FromStructArray(out.array));
+    ARROW_RETURN_NOT_OK(out.record_batch->ValidateFull());
+    // Finalize the input ChunkedArray
+    out.chunked_array = SliceToChunkedArray(*out.array, out.num_chunks);
+    ARROW_RETURN_NOT_OK(out.chunked_array->ValidateFull());
+
+    // For each expected child array, create a chunked equivalent (we use a different
+    // chunk layout for each top-level column to make the Table test more interesting)
+    for (OutputValues* v :
+         {&out.v0, &out.v1, &out.v1_0, &out.v1_1, &out.v1_1_0, &out.v1_1_1,
+          &out.v1_0_flat, &out.v1_1_flat, &out.v1_1_0_flat, &out.v1_1_1_flat}) {
+      v->chunked_array =
+          SliceToChunkedArray(*v->array, out.num_column_chunks[v->path[0]]);
+    }
+    // Finalize the input Table
+    out.table =
+        Table::Make(out.schema, {out.v0.chunked_array, out.v1.chunked_array}, kNumRows);
+    ARROW_RETURN_NOT_OK(out.table->ValidateFull());
+
+    return std::move(out);
   }
-  EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, HasSubstr("empty indices cannot be traversed"),
-                                  FieldPath().Get(*table));
-}
-
-TEST(TestFieldPath, GetForChunkedArray) {
-  using testing::HasSubstr;
-
-  auto f0 = field("a", int32());
-  auto f1 = field("b", int32());
-  auto f2 = field("c", struct_({f1}));
-  auto f3 = field("d", struct_({f0, f2}));
-  auto type = struct_({f0, f1, f3});
-
-  auto column0 = ChunkedArrayFromJSON(f0->type(), {"[0,1,2,3]"});
-  auto column1 = ChunkedArrayFromJSON(f1->type(), {"[3,2,1,0]"});
-  auto column2_1 =
-      ChunkedArrayFromJSON(f2->type(), {R"([{"b":3},{"b":2},{"b":1},{"b":0}])"});
-  auto chunked_array = ChunkedArrayFromJSON(
-      type,
-      {
-          R"([{"a":0,"b":3,"d":{"a":0,"c":{"b":3}}}])",
-          R"([{"a":1,"b":2,"d":{"a":1,"c":{"b":2}}},{"a":2,"b":1,"d":{"a":2,"c":{"b":1}}}])",
-          R"([{"a":3,"b":0,"d":{"a":3,"c":{"b":0}}}])",
-      });
-  ASSERT_OK(chunked_array->ValidateFull());
-
-  ASSERT_OK_AND_ASSIGN(auto v0, FieldPath({0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v1, FieldPath({1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_0, FieldPath({2, 0}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1, FieldPath({2, 1}).Get(*chunked_array));
-  ASSERT_OK_AND_ASSIGN(auto v2_1_0, FieldPath({2, 1, 0}).Get(*chunked_array));
-
-  for (const auto& v : {v0, v1, v2_0, v2_1, v2_1_0}) {
-    EXPECT_EQ(v->num_chunks(), chunked_array->num_chunks());
+
+  static std::shared_ptr<ChunkedArray> SliceToChunkedArray(const Array& array,
+                                                           int num_chunks) {
+    ARROW_CHECK(num_chunks > 0 && array.length() >= num_chunks);

Review Comment:
   This function is only used to generate the shared test case so I figured that aborting here would be appropriate (otherwise, anything could happen).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1196531763


##########
cpp/src/arrow/type_test.cc:
##########
@@ -364,152 +365,341 @@ TEST(TestField, TestMerge) {
   }
 }
 
-TEST(TestFieldPath, Basics) {
-  auto f0 = field("alpha", int32());
-  auto f1 = field("beta", int32());
-  auto f2 = field("alpha", int32());
-  auto f3 = field("beta", int32());
-  Schema s({f0, f1, f2, f3});
+struct FieldPathTestCase {

Review Comment:
   Given that this is adding quite a lot of test code, and this file is already long, I think it may be time to move the `Field{Ref,Path}` tests to a separate test module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175671354


##########
cpp/src/arrow/type.cc:
##########
@@ -1328,20 +1363,54 @@ Result<std::shared_ptr<Array>> FieldPath::Get(const Array& array) const {
 
 Result<std::shared_ptr<ArrayData>> FieldPath::Get(const ArrayData& data) const {
   if (data.type->id() != Type::STRUCT) {
-    return Status::NotImplemented("Get child data of non-struct array");
+    return FieldPathGetImpl::NonStructError();
   }
   return FieldPathGetImpl::Get(this, data.child_data);
 }
 
 Result<std::shared_ptr<ChunkedArray>> FieldPath::Get(
     const ChunkedArray& chunked_array) const {
   if (chunked_array.type()->id() != Type::STRUCT) {
-    return Status::NotImplemented("Get child data of non-struct chunked array");
+    return FieldPathGetImpl::NonStructError();
   }
-  auto columns = ChunkedArrayRef(chunked_array).Flatten();
+  auto columns = ChunkedArrayRef(chunked_array).FlattenZeroCopy();
   return FieldPathGetImpl::Get(this, columns);
 }
 
+Result<std::shared_ptr<Array>> FieldPath::GetFlattened(const Array& array) const {
+  if (array.type_id() != Type::STRUCT) {
+    return FieldPathGetImpl::NonStructError();
+  }
+  auto&& struct_array = checked_cast<const StructArray&>(array);

Review Comment:
   Equivalent to `const auto&` in this case, but yeah... probably less clear and not exactly necessary (same for some of the test code).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org