You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/08/09 12:48:00 UTC

[jira] [Created] (ARROW-6187) [C++] fallback to storage type when writing ExtensionType to Parquet

Joris Van den Bossche created ARROW-6187:
--------------------------------------------

             Summary: [C++] fallback to storage type when writing ExtensionType to Parquet
                 Key: ARROW-6187
                 URL: https://issues.apache.org/jira/browse/ARROW-6187
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Writing a table that contains an ExtensionType array to a parquet file is not yet implemented. It currently raises "ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: extension<arrow.py_extension_type>" (for a PyExtensionType in this case).

I think minimal support can consist of writing the storage type / array. 

We also might want to save the extension name and metadata in the parquet FileMetadata. 

Later on, this could be potentially be used to restore the extension type when reading. This is related to other issues that need to save the arrow schema (categorical: ARROW-5480, time zones: ARROW-5888). Only in this case, we probably want to store the serialised type in addition to the schema (which only has the extension type's name). 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)