You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/10 20:12:00 UTC

[jira] [Updated] (ARROW-12371) [C++] Allow batch readahead to be processed in a "first-available" fashion instead of an "in order" fashion

     [ https://issues.apache.org/jira/browse/ARROW-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-12371:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] Allow batch readahead to be processed in a "first-available" fashion instead of an "in order" fashion
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12371
>                 URL: https://issues.apache.org/jira/browse/ARROW-12371
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The combination of EnumeratingGenerator and ResequencingGenerator can be used to process items in a "first available" fashion.  This is currently used in the scanner to compensate for intermittent fragment performance.
> A potential further improvement would be to use this same pattern for out-of-order readahead.  For example, when reading a parquet file or an IPC file via S3 the reader may request multiple batches in parallel.  If the next batch is slow but the later batches are fast we could start processing the later batches while we wait for the next batch.
> This would be a pretty minor improvement to latency (probably won't affect throughput much) so I don't know that it is a very high priority fix.  It may be best to wait until profiling shows this is an issue.
> This will require the EnumeratingGenerator be made async-reentrant



--
This message was sent by Atlassian Jira
(v8.3.4#803005)