You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/02/01 18:20:00 UTC

[jira] [Updated] (ARROW-7740) [R] Crash/bad data in converting Arrow list struct type

     [ https://issues.apache.org/jira/browse/ARROW-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson updated ARROW-7740:
-----------------------------------
    Summary: [R] Crash/bad data in converting Arrow list struct type  (was: R arrow::read_json_arrow aborts session with nested ndjson and default as_data_frame=TRUE)

> [R] Crash/bad data in converting Arrow list struct type
> -------------------------------------------------------
>
>                 Key: ARROW-7740
>                 URL: https://issues.apache.org/jira/browse/ARROW-7740
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>            Reporter: John Sheffield
>            Priority: Minor
>             Fix For: 1.0.0
>
>
> Reading a nested ndjson file using arrow::read_json_arrow with the default `as_data_frame=TRUE` causes an immediate session crash, but switching to `as_data_frame=FALSE` works fine and the resulting arrow object schema is correct.
> {code:java}
> library(tidyr)
> library(arrow)
> library(jsonlite)
> # Create two test datasets: long_df and a variant that nests long_df into
> # a dataframe with a list-column 'nest_level1' containing a dataframe
> long_df <- tidyr::expand_grid(ABC = LETTERS[1:3], xyz = letters[24:26], num = 1:3)
> long_df[["ftr1"]] <- runif(nrow(long_df))
> long_df[["ftr2"]] <- rpois(nrow(long_df), 100)
> nested_frame_level1 <- tidyr::nest(long_df, nest_level1 = c(num, ftr1, ftr2))
> # Write and validate nested ndjson
> jsonlite::stream_out(nested_frame_level1, con = file("nested_frame_level1.json"))
> readLines("nested_frame_level1.json", n = 2) # check we have valid ndjson here
> # This does not cause a session crash
> nested_arrow <- arrow::read_json_arrow(file = "nested_frame_level1.json", as_data_frame = FALSE)
> nested_arrow$schema # correctly interprets 'nest_level1` as `list<item: struct<num: int64, ftr1: double, ftr2: int64>>`
> # This causes a session crash
> nested_df <- arrow::read_json_arrow(file = "nested_frame_level1.json", as_data_frame = TRUE)
>  
> {code}
> The R package version of Arrow is latest CRAN release (arrow * 0.15.1.1, 2019-11-05, CRAN (R 3.5.2)). I'm running this code in a slightly older R version (3.5.1), macOS 10.14.6, x86_64, darwin15.6.0, via RStudio 1.2.5001. 
> [edit: formatting fix]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)