You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Xinyu Z <xz...@gmail.com> on 2022/07/18 04:13:14 UTC

[C++][Parquet]async low-level reader?

It seems the arrow-dataset api already has the async IO layer.
However, I want to use the low-level Parquet api with async IO. That
is, the decoded values are consumed by some user-defined function, not
converted to arrow table. Something similar to ScanFileContents:
https://github.com/apache/arrow/blob/master/cpp/src/parquet/file_reader.cc#L818

The current async io interface inside ParquetFileReader seems to be
served for arrow dataset api. I was wondering if there is any code
snippet to implement the async version of ScanFileContents? If there
is no, one way for me to approach this is to try to use
ParquetFileReader::PreBuffer and ParquetFileReader::WhenBuffered and
refer to dataset api implementation.