You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/10/08 15:00:00 UTC

[jira] [Resolved] (ARROW-6537) [R] Pass column_types to CSV reader

     [ https://issues.apache.org/jira/browse/ARROW-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson resolved ARROW-6537.
------------------------------------
    Resolution: Fixed

Issue resolved by pull request 7807
[https://github.com/apache/arrow/pull/7807]

> [R] Pass column_types to CSV reader
> -----------------------------------
>
>                 Key: ARROW-6537
>                 URL: https://issues.apache.org/jira/browse/ARROW-6537
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Neal Richardson
>            Assignee: Romain Francois
>            Priority: Major
>              Labels: csv, dataset, pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See also ARROW-6536. It may be the case that the csv reader does accept a Schema now, I think I saw that, but otherwise it takes unordered_map. 
> {{read_csv_arrow}} should take for {{col_types}} either a Schema, a named list of Types, or the "compact string representation" that {{readr}} supports. Per its docs, "c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(), f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(), etc. I'm not sure if ? and - are supported, and/or what exactly happens if you don't specify types for all columns, but I guess we'll find out, and we can make JIRAs if important features are missing. 
> Following the existing conventions in csv.R, that compact string representation would be encapsulated in {{read_csv_arrow}}, so CsvTableReader and the various Csv*Options would only deal with the Arrow C++ interface. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)