You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/03/02 13:02:00 UTC

[jira] [Comment Edited] (PARQUET-1993) [C++] Expose when prefetching completes

    [ https://issues.apache.org/jira/browse/PARQUET-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293682#comment-17293682 ] 

David Li edited comment on PARQUET-1993 at 3/2/21, 1:01 PM:
------------------------------------------------------------

This is so that we can separate the I/O and CPU-bound stages of Parquet reading; we can do the I/O in one context, and when that's done, transfer it to another executor and decode the file. Although right now, I/O is dispatched to a dedicated thread pool, a consumer has to try to read batches and block (what was presumably intended to be an executor for CPU-bound work) until the I/O is done.

Also cc [~westonpace]


was (Author: lidavidm):
This is so that we can separate the I/O and CPU-bound stages of Parquet reading; we can do the I/O in one context, and when that's done, transfer it to another executor and decode the file. Although right now, I/O is dispatched to a dedicated thread pool, a consumer has to try to read batches and block (what was presumably intended to be an executor for CPU-bound work) until the I/O is done.

> [C++] Expose when prefetching completes
> ---------------------------------------
>
>                 Key: PARQUET-1993
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1993
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: David Li
>            Assignee: David Li
>            Priority: Major
>
> As a follow up to PARQUET-1820, we should let an application be notified when pre-buffering has completed (e.g. PreBuffer() should return Future<void>). This would let an application pre-buffer some amount of data (across multiple files and/or row groups) and then decode data as it becomes available instead of blocking.
> A more ergonomic API would be to expose Future<RecordBatchReader> or something along those lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)