You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/05/20 21:55:00 UTC

[jira] [Assigned] (ARROW-8703) [R] schema$metadata should be properly typed

     [ https://issues.apache.org/jira/browse/ARROW-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson reassigned ARROW-8703:
--------------------------------------

    Assignee: Neal Richardson

> [R] schema$metadata should be properly typed
> --------------------------------------------
>
>                 Key: ARROW-8703
>                 URL: https://issues.apache.org/jira/browse/ARROW-8703
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 0.17.0
>            Reporter: René Rex
>            Assignee: Neal Richardson
>            Priority: Major
>
> Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue:
> {{import sys}}
> {{import numpy as np}}
> {{import pyarrow as pa}}
> {{import pyarrow.parquet as pq}}
> {{print(sys.version)}}
> {{print(pa.__version__)}}
> {{x = np.random.randint(0, 10, (10, 3))}}
> {{arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]}}
> {{table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],}}
> {{ metadata=\{'foo': '42'})}}
> {{pq.write_table(table, 'array.parquet', compression='snappy')}}
> {{table = pq.read_table('array.parquet')}}
> {{metadata = table.schema.metadata}}
> {{print(metadata)}}
> {{print(type(metadata))}}
>  
> And in R:
>  
> {{library(arrow)}}
> {{print(R.version)}}
> {{print(packageVersion("arrow"))}}
> {{table <- read_parquet("array.parquet", as_data_frame = FALSE)}}
> {{metadata <- table$schema$metadata}}
> {{print(metadata)}}
> {{print(is(metadata))}}
> {{print(metadata["foo"])}}{{ }}
>  
> Output Python:
> {{3.6.8 (default, Aug 7 2019, 17:28:10) }}
> {{[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]}}
> {{0.13.0}}
> {{OrderedDict([(b'foo', b'42')])}}
> {{<class 'collections.OrderedDict'>}}
>  
> Output R:
> {{[1] ‘0.17.0’}}
> {{[1] "\n-- metadata --\nfoo: 42"}}
> {{[1] "character" "vector" "data.frameRowLabels"}}
> {{[4] "SuperClassMethod" }}
> {{[1] NA}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)