You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/05/06 17:38:00 UTC

[jira] [Commented] (ARROW-8527) [C++][CSV] Add support for ReadOptions::skip_rows >= block_size

    [ https://issues.apache.org/jira/browse/ARROW-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340338#comment-17340338 ] 

Weston Pace commented on ARROW-8527:
------------------------------------

This behavior could be useful for ARROW-12598.  Also, in a recent discussion, n3world (no Jira I can find) pointed out that `skip_rows` is probably not the best tool for this.  This sort of "paging" would require skipping data rows so it would be nice if the "skip header rows" (constant parameter based on the tool generating the data) is distinct from "skip data rows" (per query parameter based on paging needs)

> [C++][CSV] Add support for ReadOptions::skip_rows >= block_size
> ---------------------------------------------------------------
>
>                 Key: ARROW-8527
>                 URL: https://issues.apache.org/jira/browse/ARROW-8527
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ravil Bikbulatov
>            Priority: Major
>
> Current implementation throws error in reader.cc:286 when skip_rows > header. However, in some workloads skip_rows used for not only skipping header but for just skipping first n-rows. In this case block-size constraint is greatly interferes. I think this constraint could be removed without performance reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)