You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/12/17 20:49:00 UTC

[jira] [Commented] (ARROW-9186) [R] Allow specifying CSV file encoding

    [ https://issues.apache.org/jira/browse/ARROW-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461663#comment-17461663 ] 

Dewey Dunnington commented on ARROW-9186:
-----------------------------------------

Doesn't look like it's implemented as part of the [C++ ReadOptions|https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L138-L171] but that Python uses the ReadOptions class to carry this information (with a Python-only {{.encoding}} attribute). We could/should do this, too.

It looks from the Python PR that we'd have to provide our own {{iconv}} and wrap a {{TransformInputStream}}. R provides an {{iconv}} at the C level, so we shouldn't need to call into R for this. I [wrote about this a while ago](https://fishandwhistle.net/post/2021/using-rs-cross-platform-iconv-wrapper-from-cpp11/) and I think there's another example in either readr or vroom where this bit of code is wrapped up in a helper class.

Where Python creates its wrapper around {{TransformInputStream}}: https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/io.cc#L340-L371

Where R creates the {{CsvReadOptions}}: https://github.com/apache/arrow/blob/master/r/R/csv.R#L402-L416

> [R] Allow specifying CSV file encoding
> --------------------------------------
>
>                 Key: ARROW-9186
>                 URL: https://issues.apache.org/jira/browse/ARROW-9186
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>             Fix For: 7.0.0
>
>
> ARROW-9106 did this for Python and we should have the same in R



--
This message was sent by Atlassian Jira
(v8.20.1#820001)