You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/06/19 01:44:35 UTC

[GitHub] [arrow] benibus commented on a diff in pull request #35727: GH-33206: [C++] Add support for StructArray sorting and nested sort keys

benibus commented on code in PR #35727:
URL: https://github.com/apache/arrow/pull/35727#discussion_r1233439604


##########
cpp/src/arrow/compute/kernels/vector_sort_internal.h:
##########
@@ -737,17 +757,32 @@ struct ResolvedTableSortKey {
   static Result<std::vector<ResolvedTableSortKey>> Make(
       const Table& table, const RecordBatchVector& batches,
       const std::vector<SortKey>& sort_keys) {
-    auto factory = [&](const SortField& f) {
-      const auto& type = table.schema()->field(f.field_index)->type();
+    auto factory = [&](const SortField& f) -> Result<ResolvedTableSortKey> {
+      std::shared_ptr<DataType> type;
+      int64_t null_count = 0;
+      ArrayVector chunks;
+      chunks.reserve(batches.size());
+
       // We must expose a homogenous chunking for all ResolvedSortKey,
-      // so we can't simply pass `table.column(f.field_index)`
-      ArrayVector chunks(batches.size());
-      std::transform(batches.begin(), batches.end(), chunks.begin(),
-                     [&](const std::shared_ptr<RecordBatch>& batch) {
-                       return batch->column(f.field_index);
-                     });
-      return ResolvedTableSortKey(type, std::move(chunks), f.order,
-                                  table.column(f.field_index)->null_count());
+      // so we can't simply access the column from the table directly.
+      if (f.is_nested()) {
+        ARROW_ASSIGN_OR_RAISE(auto schema_field, f.path.Get(*table.schema()));
+        type = schema_field->type();
+        for (const auto& batch : batches) {
+          ARROW_ASSIGN_OR_RAISE(auto child, f.path.GetFlattened(*batch));
+          null_count += child->null_count();
+          chunks.push_back(std::move(child));
+        }
+      } else {
+        null_count = table.column(f.path[0])->null_count();

Review Comment:
   No, probably not a meaningful difference in this case (`GetFlattened` should be mostly trivial for top-level fields). Removed the non-nested branch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org