You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/02/18 02:35:00 UTC

[jira] [Commented] (ARROW-11682) [R] Regression from 2.0.0 -> 3.0.0: Null character in string prevents dataset from loading

    [ https://issues.apache.org/jira/browse/ARROW-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286241#comment-17286241 ] 

Neal Richardson commented on ARROW-11682:
-----------------------------------------

It loaded in 2.0.0 but the string was silently truncated, which is (arguably) worse. 

https://arrow.apache.org/docs/r/news/index.html#enhancements mentions the solution, which is to set `options(arrow.skip_nul = TRUE)` to read in files with embedded nuls. I don't recommend this as a global setting though because it will likely be significantly slower. 

There's some discussion on ARROW-11478 to improve this experience, please feel free to chime in there if you have opinions. And see ARROW-6582 and the linked pull request if you're interested in more details on how we got here. 



> [R] Regression from 2.0.0 -> 3.0.0:  Null character in string prevents dataset from loading
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11682
>                 URL: https://issues.apache.org/jira/browse/ARROW-11682
>             Project: Apache Arrow
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Kyle Kavanagh
>            Priority: Major
>
> When a feather file contains a valid string which happens to contain the appearance of a null character, R fails to read the file.  Example string: '#\001200\01'
> Pyarrow is able to successfully read the file and correctly display the string.
> This dataset was previously able to be loaded in 2.0.0 but fails in 3.0.0 with the error:
> Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
>   embedded nul in string: '#\001200\01'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)