You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Hor911 (via GitHub)" <gi...@apache.org> on 2023/03/06 20:47:23 UTC

[GitHub] [arrow] Hor911 commented on a diff in pull request #34461: GH-34460: [C++][Parquet] Split arrow::FileReader::ReadRowGroups() for flexible async IO

Hor911 commented on code in PR #34461:
URL: https://github.com/apache/arrow/pull/34461#discussion_r1127016579


##########
cpp/src/parquet/arrow/reader.h:
##########
@@ -249,6 +249,13 @@ class PARQUET_EXPORT FileReader {
 
   virtual ::arrow::Status ReadRowGroup(int i, std::shared_ptr<::arrow::Table>* out) = 0;
 
+  virtual ::arrow::Status WillNeedRowGroups(const std::vector<int>& row_groups,
+                                            const std::vector<int>& column_indices) = 0;

Review Comment:
   It can't be expressed in this API. This method is translated into call of arrow::io::RandomAccessFile::WillNeed()
   
   No-op is default and valid implementation of WillNeed. It means that no preload/prefetch is provided in this RAF implementation. All work will be done when ReadAt or ReadAsync is called.
   
   Current Arrow API expect tight coupling between FileReader, ParquetFileReader and intermediate Cache. It is not possible to provide true async decoupling w/o significant API changes (it was discussed somewhere).
   
   For my technique to work, one should provide special implementation of arrow::io::RandomAccessFile which will receive WillNeed, download the data and signals it in some "hidden" way. Not perfect, but possible to reach what I needed w/o API changes and any other side effects.
   
    I think I'll be able to provide you link ro real use case tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org