You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/04/13 21:01:00 UTC
[jira] [Created] (ARROW-12371) [C++] Allow EnumeratingGenerator to
be async-reentrant
Weston Pace created ARROW-12371:
-----------------------------------
Summary: [C++] Allow EnumeratingGenerator to be async-reentrant
Key: ARROW-12371
URL: https://issues.apache.org/jira/browse/ARROW-12371
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
The combination of EnumeratingGenerator and ResequencingGenerator can be used to process items in a "first available" fashion. This is currently used in the scanner to compensate for intermittent fragment performance.
A potential further improvement would be to use this same pattern for out-of-order readahead. For example, when reading a parquet file or an IPC file via S3 the reader may request multiple batches in parallel. If the next batch is slow but the later batches are fast we could start processing the later batches while we wait for the next batch.
This would be a pretty minor improvement to latency (probably won't affect throughput much) so I don't know that it is a very high priority fix. It may be best to wait until profiling shows this is an issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)