You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/09/07 11:09:00 UTC

[jira] [Commented] (ARROW-17639) [R] arrow::write_parquet fails when column first element is null

    [ https://issues.apache.org/jira/browse/ARROW-17639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601250#comment-17601250 ] 

Nicola Crane commented on ARROW-17639:
--------------------------------------

This is indeed a bug, and thanks for reporting it [~dmedw01].  It's due to how we infer types of lists - will get a PR up to fix this soon.  A temporary workaround would be to reorder the list so that the first element is never NULL, though I can see that this is not ideal.

> [R] arrow::write_parquet fails when column first element is null
> ----------------------------------------------------------------
>
>                 Key: ARROW-17639
>                 URL: https://issues.apache.org/jira/browse/ARROW-17639
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 9.0.0
>         Environment: Ubuntu 18.04; R 4.1.1; arrow 9.0
>            Reporter: David
>            Priority: Major
>
> * Works
> reticulate::py_run_string("
> import pandas as pd
> df = pd.DataFrame( \{'col1': [[1,2], None, [3,4]]}
> )
> df.to_parquet('/tmp/test1.parquet')
> ")
> df1 <- arrow::read_parquet("/tmp/test1.parquet")
> arrow::write_parquet(df1, tempfile(fileext = ".parquet"))
>  * Fails in arrow 9.0; works in arrow 5.0
> reticulate::py_run_string("
> import pandas as pd
> df = pd.DataFrame( \{'col1': [None, [1,2], [3,4]]}
> )
> df.to_parquet('/tmp/test2.parquet')
> ")
> df2 <- arrow::read_parquet("/tmp/test2.parquet")
> arrow::write_parquet(df2, tempfile(fileext = ".parquet"))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)