You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/04/13 21:01:00 UTC

[jira] [Created] (ARROW-12371) [C++] Allow EnumeratingGenerator to be async-reentrant

Weston Pace created ARROW-12371:
-----------------------------------

             Summary: [C++] Allow EnumeratingGenerator to be async-reentrant
                 Key: ARROW-12371
                 URL: https://issues.apache.org/jira/browse/ARROW-12371
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


The combination of EnumeratingGenerator and ResequencingGenerator can be used to process items in a "first available" fashion.  This is currently used in the scanner to compensate for intermittent fragment performance.

A potential further improvement would be to use this same pattern for out-of-order readahead.  For example, when reading a parquet file or an IPC file via S3 the reader may request multiple batches in parallel.  If the next batch is slow but the later batches are fast we could start processing the later batches while we wait for the next batch.

This would be a pretty minor improvement to latency (probably won't affect throughput much) so I don't know that it is a very high priority fix.  It may be best to wait until profiling shows this is an issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)