You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/25 10:48:13 UTC

[GitHub] [arrow] bkietz commented on a change in pull request #7534: ARROW-8729: [C++][Dataset] Ensure non-empty batches when only virtual columns are projected

bkietz commented on a change in pull request #7534:
URL: https://github.com/apache/arrow/pull/7534#discussion_r445470685



##########
File path: cpp/src/parquet/arrow/reader.cc
##########
@@ -338,22 +348,37 @@ class RowGroupRecordBatchReader : public ::arrow::RecordBatchReader {
     // TODO (hatemhelal): Consider refactoring this to share logic with ReadTable as this
     // does not currently honor the use_threads option.
     std::vector<std::shared_ptr<ChunkedArray>> columns(field_readers_.size());
-    for (size_t i = 0; i < field_readers_.size(); ++i) {
-      RETURN_NOT_OK(field_readers_[i]->NextBatch(batch_size_, &columns[i]));
-      if (columns[i]->num_chunks() > 1) {
-        return Status::NotImplemented("This class cannot yet iterate chunked arrays");
+    int64_t num_rows = -1;
+
+    if (columns.empty()) {
+      num_rows = std::min(batch_size_, *row_group_remaining_size_);

Review comment:
       ```suggestion
         // num_rows cannot be derived from field_readers_ so compute it using
         // row group sizes cached from metadata
         num_rows = std::min(batch_size_, *row_group_remaining_size_);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org