You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/04/29 13:39:00 UTC
[jira] [Created] (ARROW-12598) [C++][Dataset] Implement row-count
for CSV or allow selecting 0 columns from CSV
David Li created ARROW-12598:
--------------------------------
Summary: [C++][Dataset] Implement row-count for CSV or allow selecting 0 columns from CSV
Key: ARROW-12598
URL: https://issues.apache.org/jira/browse/ARROW-12598
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: David Li
For ARROW-9697 file formats can implement a fast path to count rows in a fragment. For CSV this isn't implemented. We could do the equivalent of {{wc -l}} for CSV (using the lexing boundary finder as needed) and adjust the row count based on options for the header, or we could change the CSV reader options to allow selecting no columns (right now, passing no columns to the reader implies you want to read all columns). The former is likely faster but the latter will be more robust/less work.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)