You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David (Jira)" <ji...@apache.org> on 2022/09/07 01:09:00 UTC

[jira] [Updated] (ARROW-17639) arrow::write_parquet fails when column first element is null

     [ https://issues.apache.org/jira/browse/ARROW-17639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David updated ARROW-17639:
--------------------------
    Description: 
* Works
reticulate::py_run_string("
import pandas as pd
df = pd.DataFrame({'col1': [[1,2], None, [3,4]]})
df.to_parquet('/tmp/test1.parquet')
")
df1 <- arrow::read_parquet("/tmp/test1.parquet")
arrow::write_parquet(df1, tempfile(fileext = ".parquet"))

* Fails arrow 9.0; works arrow 5.0
reticulate::py_run_string("
import pandas as pd
df = pd.DataFrame({'col1': [None, [1,2], [3,4]]})
df.to_parquet('/tmp/test2.parquet')
")
df2 <- arrow::read_parquet("/tmp/test2.parquet")
arrow::write_parquet(df2, tempfile(fileext = ".parquet"))

  was:

* Works
df1 <- tibble::tibble(
  column_no_nulls = list(1L, 2L, 3L)    
)
arrow::write_parquet(df1, tempfile(fileext = ".parquet"))

* Works
df2 <- tibble::tibble(
  column_with_nonnull_first_element = list(1L, NULL, 3L)    
)
arrow::write_parquet(df2, tempfile(fileext = ".parquet"))

* Fails in arrow 9.0 Works in arrow 5.0
df3 <- tibble::tibble(
  column_with_null_first_element = list(NULL, 1L, 3L)    
)
arrow::write_parquet(df3, tempfile(fileext = ".parquet"))


> arrow::write_parquet fails when column first element is null
> ------------------------------------------------------------
>
>                 Key: ARROW-17639
>                 URL: https://issues.apache.org/jira/browse/ARROW-17639
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 9.0.0
>         Environment: Ubuntu 18.04; R 4.1.1; arrow 9.0
>            Reporter: David
>            Priority: Major
>
> * Works
> reticulate::py_run_string("
> import pandas as pd
> df = pd.DataFrame({'col1': [[1,2], None, [3,4]]})
> df.to_parquet('/tmp/test1.parquet')
> ")
> df1 <- arrow::read_parquet("/tmp/test1.parquet")
> arrow::write_parquet(df1, tempfile(fileext = ".parquet"))
> * Fails arrow 9.0; works arrow 5.0
> reticulate::py_run_string("
> import pandas as pd
> df = pd.DataFrame({'col1': [None, [1,2], [3,4]]})
> df.to_parquet('/tmp/test2.parquet')
> ")
> df2 <- arrow::read_parquet("/tmp/test2.parquet")
> arrow::write_parquet(df2, tempfile(fileext = ".parquet"))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)