You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/13 13:28:01 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #11935: Pre-Feature-request - Is there a reason why metadata isn't accessible between language implementations?

jorisvandenbossche commented on issue #11935:
URL: https://github.com/apache/arrow/issues/11935#issuecomment-992476427


   > attributes set in R don't appear to be easily accessible via the python implementation and vice versa.
   
   Can you give a more concrete code example?
   
   As far as I know for Python, metadata in the table schema's metadata is written to Parquet FileMetaData key_value_metadata, which should be a standard place to put this.
   
   I am less familiar with the R side, but it seems this is similarly available in the R arrow table's metadata:
   
   ```python
   # create a table with some top-level metadata
   >>> table = pa.table({"a": [1, 2, 3], "b": [4, 5, 6]})
   >>> table = table.replace_schema_metadata({"a": "long name"})
   # in python this is exposed as a dict
   >>> table.schema.metadata
   {b'a': b'long name'}
   
   >>> import pyarrow.parquet as pq
   >>> pq.write_table(table, "test_metadata.parquet")
   # this metadata is stored in the Parquet FileMetaData "key_value_metadata", in the python interface again exposed as a dict
   >>> file_metadata = pq.read_metadata("test_metadata.parquet")
   >>> file_metadata.metadata
   {b'ARROW:schema': b'/////+gAAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAA....',
    b'a': b'long name'}
   # and after reading also available
   >>> pq.read_table("test_metadata.parquet").schema.metadata
   {b'a': b'long name'}
   ```
   
   Reading the same file from R:
   
   ```R
   > table <- read_parquet("test_metadata.parquet", as_data_frame=F)
   > table
   Table
   3 rows x 2 columns
   $a <int64>
   $b <int64>
   
   See $metadata for additional Schema metadata
   > table$metadata
   $a
   [1] "long name"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org