You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/04 13:20:46 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #12560: ARROW-15281: [C++] Implement ability to retrieve fragment filename

lidavidm commented on a change in pull request #12560:
URL: https://github.com/apache/arrow/pull/12560#discussion_r819561503



##########
File path: cpp/src/arrow/dataset/scanner.cc
##########
@@ -877,6 +882,15 @@ Result<compute::ExecNode*> MakeScanNode(compute::ExecPlan* plan,
         batch->values.emplace_back(partial.fragment.index);
         batch->values.emplace_back(partial.record_batch.index);
         batch->values.emplace_back(partial.record_batch.last);
+
+        auto filefragment_casted =
+            dynamic_cast<const FileFragment*>(partial.fragment.value.get());

Review comment:
       Use `arrow::internal::checked_cast`? This will be dynamic_cast in debug mode and static cast otherwise

##########
File path: cpp/src/arrow/dataset/scanner.cc
##########
@@ -877,6 +882,15 @@ Result<compute::ExecNode*> MakeScanNode(compute::ExecPlan* plan,
         batch->values.emplace_back(partial.fragment.index);
         batch->values.emplace_back(partial.record_batch.index);
         batch->values.emplace_back(partial.record_batch.last);
+
+        auto filefragment_casted =
+            dynamic_cast<const FileFragment*>(partial.fragment.value.get());
+        if (filefragment_casted != nullptr) {
+          batch->values.emplace_back(filefragment_casted->source().path());
+        } else {
+          batch->values.emplace_back("");

Review comment:
       Maybe something more descriptive? "(not a file)" or something? Can we call fragment->ToString() or something instead?

##########
File path: cpp/src/arrow/dataset/scanner.cc
##########
@@ -708,8 +709,12 @@ Result<ProjectionDescr> ProjectionDescr::FromNames(std::vector<std::string> name
   for (size_t i = 0; i < exprs.size(); ++i) {
     exprs[i] = compute::field_ref(names[i]);
   }
+  auto fields = dataset_schema.fields();
+  for (const auto& aug_field : kAugmentedFields) {
+    fields.push_back(aug_field);
+  }
   return ProjectionDescr::FromExpressions(std::move(exprs), std::move(names),
-                                          dataset_schema);
+                                          Schema(fields));

Review comment:
       Why is this needed now but not before? I wouldn't expect the projection to have access to these fields




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org