You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/04/24 18:55:44 UTC

[GitHub] [arrow] benibus commented on a diff in pull request #35197: GH-14946: [C++] Add flattening FieldPath/FieldRef::Get methods

benibus commented on code in PR #35197:
URL: https://github.com/apache/arrow/pull/35197#discussion_r1175670935


##########
cpp/src/arrow/type.cc:
##########
@@ -1129,18 +1129,20 @@ class ChunkedArrayData : public ChunkedColumn {
 // Return a vector of ChunkedColumns - one for each struct field.
 // Unlike ChunkedArray::Flatten, this is zero-copy and doesn't merge parent/child
 // validity bitmaps.
-ChunkedColumnVector ChunkedColumn::Flatten() const {
+ChunkedColumnVector ChunkedColumn::FlattenZeroCopy() const {
   DCHECK_EQ(type()->id(), Type::STRUCT);
 
   ChunkedColumnVector columns(type()->num_fields());
   for (int column_idx = 0; column_idx < type()->num_fields(); ++column_idx) {
     const auto& child_type = type()->field(column_idx)->type();
     ArrayDataVector chunks(num_chunks());
     for (int chunk_idx = 0; chunk_idx < num_chunks(); ++chunk_idx) {
-      const auto& child_data = chunk(chunk_idx)->child_data;
-      DCHECK_EQ(columns.size(), child_data.size());
-      DCHECK(child_type->Equals(child_data[column_idx]->type));
-      chunks[chunk_idx] = child_data[column_idx];
+      const auto& parent = chunk(chunk_idx);
+      const auto& children = parent->child_data;
+      DCHECK_EQ(columns.size(), children.size());
+      auto child = children[column_idx]->Slice(parent->offset, parent->length);

Review Comment:
   Yes (a bug that I introduced in https://github.com/apache/arrow/pull/34537). It gives incorrect results when retrieving a nested child from an array sliced from another array.
   
   Although, now that I'm thinking about it, I'd imagine the same problem would occur for the non-chunked `FieldPath::Get` variants as well. I'll need to test it and create a separate issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org