You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/03/02 13:02:00 UTC
[jira] [Comment Edited] (PARQUET-1993) [C++] Expose when
prefetching completes
[ https://issues.apache.org/jira/browse/PARQUET-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293682#comment-17293682 ]
David Li edited comment on PARQUET-1993 at 3/2/21, 1:01 PM:
------------------------------------------------------------
This is so that we can separate the I/O and CPU-bound stages of Parquet reading; we can do the I/O in one context, and when that's done, transfer it to another executor and decode the file. Although right now, I/O is dispatched to a dedicated thread pool, a consumer has to try to read batches and block (what was presumably intended to be an executor for CPU-bound work) until the I/O is done.
Also cc [~westonpace]
was (Author: lidavidm):
This is so that we can separate the I/O and CPU-bound stages of Parquet reading; we can do the I/O in one context, and when that's done, transfer it to another executor and decode the file. Although right now, I/O is dispatched to a dedicated thread pool, a consumer has to try to read batches and block (what was presumably intended to be an executor for CPU-bound work) until the I/O is done.
> [C++] Expose when prefetching completes
> ---------------------------------------
>
> Key: PARQUET-1993
> URL: https://issues.apache.org/jira/browse/PARQUET-1993
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: David Li
> Assignee: David Li
> Priority: Major
>
> As a follow up to PARQUET-1820, we should let an application be notified when pre-buffering has completed (e.g. PreBuffer() should return Future<void>). This would let an application pre-buffer some amount of data (across multiple files and/or row groups) and then decode data as it becomes available instead of blocking.
> A more ergonomic API would be to expose Future<RecordBatchReader> or something along those lines.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)