You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Jörn Horstmann (Jira)" <ji...@apache.org> on 2020/03/26 17:04:00 UTC

[jira] [Created] (ARROW-8231) Parse key_value_metadata from parquet FileMetaData into arrow schema metadata

Jörn Horstmann created ARROW-8231:
-------------------------------------

             Summary: Parse key_value_metadata from parquet FileMetaData into arrow schema metadata
                 Key: ARROW-8231
                 URL: https://issues.apache.org/jira/browse/ARROW-8231
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust
            Reporter: Jörn Horstmann


The parquet-format FileMetaData struct contains optional key value pairs with additional metadata about the schema:

[https://docs.rs/parquet-format/2.6.0/src/parquet_format/parquet_format.rs.html#3821]

When the parquet file was generated using the java avro parquet writer, this for example contains the original avro schema under the `parquet.avro.schema` or `avro.schema` keys.

It would be nice if this metadata was accessible through the `arrow::datatypes::Schema.metadata` field.

I'm willing to implement and create a pull request for this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)