You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/10/08 15:00:00 UTC
[jira] [Resolved] (ARROW-6537) [R] Pass column_types to CSV reader
[ https://issues.apache.org/jira/browse/ARROW-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson resolved ARROW-6537.
------------------------------------
Resolution: Fixed
Issue resolved by pull request 7807
[https://github.com/apache/arrow/pull/7807]
> [R] Pass column_types to CSV reader
> -----------------------------------
>
> Key: ARROW-6537
> URL: https://issues.apache.org/jira/browse/ARROW-6537
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Reporter: Neal Richardson
> Assignee: Romain Francois
> Priority: Major
> Labels: csv, dataset, pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> See also ARROW-6536. It may be the case that the csv reader does accept a Schema now, I think I saw that, but otherwise it takes unordered_map.
> {{read_csv_arrow}} should take for {{col_types}} either a Schema, a named list of Types, or the "compact string representation" that {{readr}} supports. Per its docs, "c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(), f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(), etc. I'm not sure if ? and - are supported, and/or what exactly happens if you don't specify types for all columns, but I guess we'll find out, and we can make JIRAs if important features are missing.
> Following the existing conventions in csv.R, that compact string representation would be encapsulated in {{read_csv_arrow}}, so CsvTableReader and the various Csv*Options would only deal with the Arrow C++ interface.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)