You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "CaoZB (via GitHub)" <gi...@apache.org> on 2023/04/19 14:17:16 UTC

[GitHub] [arrow] CaoZB opened a new issue, #35236: params about FileReaderImpl::ReadRowGroups

CaoZB opened a new issue, #35236:
URL: https://github.com/apache/arrow/issues/35236

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Interface `::arrow::Status ReadRowGroups` in file src/parquet/arrow/reader.h(https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/reader.cc#L1219) has an input param named `const std::vector<int>& column_indices`, here are my questions:
   1.I read the code and thought that column_indices should be leaf's column index, is that right? And If it is right, how can i get leaf's column index? I didn't find any interface about it.
   2.line 1254 in file src/parquet/arrow/reader.cc(https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/reader.cc#L1254),`RETURN_NOT_OK(ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column));`, i think the first param `static_cast<int>(i)` should be index of `std::vector<SchemaField> schema_fields` in struct `SchemaManifest`(https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/schema.h#L115), am i right? If so, i believe current passed value is not as expected.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35236: [C++] params about FileReaderImpl::ReadRowGroups

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35236:
URL: https://github.com/apache/arrow/issues/35236#issuecomment-1515674378

   You can also tag the issue with "parquet".
   
   1. Yes. It's Leaf's column index. You can get the column from `SchemaManifest`
   2. Not. If there is not nested struct, they should be near. But if we have nested struct, `schema_fields` would be a tree, and column is flatten.
   
   You can talk a look at comment below:
   
   ```
     /// To get the index for a particular leaf field, one can use
     /// manifest().schema_fields to get the top level fields, and then walk the
     /// tree to identify the relevant leaf fields and access its column_index.
     /// To get the total number of leaf fields, use FileMetadata.num_columns().
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU closed issue #35236: [C++][Parquet] params about FileReaderImpl::ReadRowGroups

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU closed issue #35236: [C++][Parquet] params about FileReaderImpl::ReadRowGroups
URL: https://github.com/apache/arrow/issues/35236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35236: [C++][Parquet] params about FileReaderImpl::ReadRowGroups

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35236:
URL: https://github.com/apache/arrow/issues/35236#issuecomment-1528824489

   Does this solve your problem @CaoZB ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] CaoZB commented on issue #35236: [C++][Parquet] params about FileReaderImpl::ReadRowGroups

Posted by "CaoZB (via GitHub)" <gi...@apache.org>.
CaoZB commented on issue #35236:
URL: https://github.com/apache/arrow/issues/35236#issuecomment-1562195220

   Yes, thank you so much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org