You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/07 21:02:00 UTC

[jira] [Updated] (ARROW-12598) [C++][Dataset] Implement row-count for CSV or allow selecting 0 columns from CSV

     [ https://issues.apache.org/jira/browse/ARROW-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-12598:
-----------------------------------
    Labels: dataset datasets pull-request-available  (was: dataset datasets)

> [C++][Dataset] Implement row-count for CSV or allow selecting 0 columns from CSV
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-12598
>                 URL: https://issues.apache.org/jira/browse/ARROW-12598
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Priority: Major
>              Labels: dataset, datasets, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For ARROW-9697 file formats can implement a fast path to count rows in a fragment. For CSV this isn't implemented. We could do the equivalent of {{wc -l}} for CSV (using the lexing boundary finder as needed) and adjust the row count based on options for the header, or we could change the CSV reader options to allow selecting no columns (right now, passing no columns to the reader implies you want to read all columns). The former is likely faster but the latter will be more robust/less work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)