You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/11/17 16:44:00 UTC

[jira] [Commented] (ARROW-10623) [R] Arrow 1.0.1 cannot read parquet file written by arrow 2.0.0

    [ https://issues.apache.org/jira/browse/ARROW-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233724#comment-17233724 ] 

Neal Richardson commented on ARROW-10623:
-----------------------------------------

The problem is that arrow 2.0 knows how to preserve data.frame attributes in Arrow schema metadata but arrow 1.0 did not--it only preserved column attributes. Unfortunately the code in arrow 1.0 is seeing the new metadata written by 2.0 and erases the existing data.frame attributes but doesn't know how to restore correctly the new ones.

A workaround for now would be to drop the metadata before converting to a data.frame, like:

{code:r}
# Using arrow 1.0.x
tab <- read_parquet("file-written-by-arrow-2-0-0.parquet", as_data_frame = FALSE)
tab$metadata$r <- NULL
df <- as.data.frame(tab)
{code}

> [R] Arrow 1.0.1 cannot read parquet file written by arrow 2.0.0
> ---------------------------------------------------------------
>
>                 Key: ARROW-10623
>                 URL: https://issues.apache.org/jira/browse/ARROW-10623
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 1.0.1, 2.0.0
>            Reporter: Fleur Kelpin
>            Priority: Major
>             Fix For: 3.0.0, 2.0.1
>
>
> h4. How to reproduce
>  * Create a data frame:
> {noformat}
> df <- data.frame(col1 = 1:100){noformat}
>  * Write it to parquet file using apache 2.0.0. The demo uses R 3.6 but same happens if you use R 4.0
>  * Read the parquet file using apache 1.0.1. I only tried that in R 3.6
> h4. Expected
> The data frame is the same as it was before:
> {noformat}
> structure(list(col1 = 1:100), row.names = c(NA, 100L), class = "data.frame"){noformat}
> h4. Actual
> The data frame has lost some information:
> {noformat}
> structure(list(1:100), class = "data.frame"){noformat}
> h4. Demo
> I'm not sure what the easiest way is to put up a demo project for this, since you need to switch between arrow installations. But I've created this docker based demo:
> [https://github.com/fdlk/arrow2/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)