You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/12/17 20:49:00 UTC
[jira] [Commented] (ARROW-9186) [R] Allow specifying CSV file encoding
[ https://issues.apache.org/jira/browse/ARROW-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461663#comment-17461663 ]
Dewey Dunnington commented on ARROW-9186:
-----------------------------------------
Doesn't look like it's implemented as part of the [C++ ReadOptions|https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L138-L171] but that Python uses the ReadOptions class to carry this information (with a Python-only {{.encoding}} attribute). We could/should do this, too.
It looks from the Python PR that we'd have to provide our own {{iconv}} and wrap a {{TransformInputStream}}. R provides an {{iconv}} at the C level, so we shouldn't need to call into R for this. I [wrote about this a while ago](https://fishandwhistle.net/post/2021/using-rs-cross-platform-iconv-wrapper-from-cpp11/) and I think there's another example in either readr or vroom where this bit of code is wrapped up in a helper class.
Where Python creates its wrapper around {{TransformInputStream}}: https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/io.cc#L340-L371
Where R creates the {{CsvReadOptions}}: https://github.com/apache/arrow/blob/master/r/R/csv.R#L402-L416
> [R] Allow specifying CSV file encoding
> --------------------------------------
>
> Key: ARROW-9186
> URL: https://issues.apache.org/jira/browse/ARROW-9186
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Priority: Major
> Fix For: 7.0.0
>
>
> ARROW-9106 did this for Python and we should have the same in R
--
This message was sent by Atlassian Jira
(v8.20.1#820001)