You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/02/18 02:35:00 UTC
[jira] [Commented] (ARROW-11682) [R] Regression from 2.0.0 ->
3.0.0: Null character in string prevents dataset from loading
[ https://issues.apache.org/jira/browse/ARROW-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286241#comment-17286241 ]
Neal Richardson commented on ARROW-11682:
-----------------------------------------
It loaded in 2.0.0 but the string was silently truncated, which is (arguably) worse.
https://arrow.apache.org/docs/r/news/index.html#enhancements mentions the solution, which is to set `options(arrow.skip_nul = TRUE)` to read in files with embedded nuls. I don't recommend this as a global setting though because it will likely be significantly slower.
There's some discussion on ARROW-11478 to improve this experience, please feel free to chime in there if you have opinions. And see ARROW-6582 and the linked pull request if you're interested in more details on how we got here.
> [R] Regression from 2.0.0 -> 3.0.0: Null character in string prevents dataset from loading
> -------------------------------------------------------------------------------------------
>
> Key: ARROW-11682
> URL: https://issues.apache.org/jira/browse/ARROW-11682
> Project: Apache Arrow
> Issue Type: New Feature
> Affects Versions: 3.0.0
> Reporter: Kyle Kavanagh
> Priority: Major
>
> When a feather file contains a valid string which happens to contain the appearance of a null character, R fails to read the file. Example string: '#\001200\01'
> Pyarrow is able to successfully read the file and correctly display the string.
> This dataset was previously able to be loaded in 2.0.0 but fails in 3.0.0 with the error:
> Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
> embedded nul in string: '#\001200\01'
--
This message was sent by Atlassian Jira
(v8.3.4#803005)