You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/13 11:23:14 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7704: ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader

jorisvandenbossche commented on a change in pull request #7704:
URL: https://github.com/apache/arrow/pull/7704#discussion_r453580484



##########
File path: cpp/src/parquet/arrow/reader.cc
##########
@@ -780,11 +741,29 @@ Status GetReader(const SchemaField& field, const std::shared_ptr<ReaderContext>&
 Status FileReaderImpl::GetRecordBatchReader(const std::vector<int>& row_group_indices,
                                             const std::vector<int>& column_indices,
                                             std::unique_ptr<RecordBatchReader>* out) {
-  // column indices check
-  for (auto row_group_index : row_group_indices) {
+  // row group indices check
+  for (int row_group_index : row_group_indices) {
     RETURN_NOT_OK(BoundsCheckRowGroup(row_group_index));
   }
 
+  // column indices check
+  ARROW_ASSIGN_OR_RAISE(std::vector<int> field_indices,
+                        manifest_.GetFieldIndices(column_indices));
+
+  std::shared_ptr<::arrow::Schema> batch_schema;
+  RETURN_NOT_OK(GetSchema(&batch_schema));
+
+  // filter to only arrow::Fields which contain the selected physical columns
+  {
+    ::arrow::FieldVector selected_fields;
+
+    for (int field_idx : field_indices) {
+      selected_fields.push_back(batch_schema->field(field_idx));
+    }
+
+    batch_schema = ::arrow::schema(std::move(selected_fields));

Review comment:
       should this preserver the metadata of the original `batch_schema`? (I don't know where the `batch_schema` is further used, though)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org