You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/04/29 13:39:00 UTC

[jira] [Created] (ARROW-12598) [C++][Dataset] Implement row-count for CSV or allow selecting 0 columns from CSV

David Li created ARROW-12598:
--------------------------------

             Summary: [C++][Dataset] Implement row-count for CSV or allow selecting 0 columns from CSV
                 Key: ARROW-12598
                 URL: https://issues.apache.org/jira/browse/ARROW-12598
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: David Li


For ARROW-9697 file formats can implement a fast path to count rows in a fragment. For CSV this isn't implemented. We could do the equivalent of {{wc -l}} for CSV (using the lexing boundary finder as needed) and adjust the row count based on options for the header, or we could change the CSV reader options to allow selecting no columns (right now, passing no columns to the reader implies you want to read all columns). The former is likely faster but the latter will be more robust/less work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)