You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/23 11:52:53 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

jorisvandenbossche commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r413750283



##########
File path: python/pyarrow/tests/test_extension_type.py
##########
@@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type):
     import base64
     decoded_schema = base64.b64decode(meta.metadata[b"ARROW:schema"])
     schema = pa.ipc.read_schema(pa.BufferReader(decoded_schema))
-    assert schema.field("ext").metadata == {
-        b'ARROW:extension:metadata': b'freq=D',
-        b'ARROW:extension:name': b'pandas.period'}
+    # Since the type could be reconstructed, the extension type metadata is
+    # absent.
+    assert schema.field("ext").metadata == {}

Review comment:
       I don't have a fully good sense of what people do with this metadata, but I suppose when the type was recognized, it should not be a problem if the metadata is not present anymore (since you can always retrieve it again from the type instance).
   
   I know Micah mentioned they were using those metadata in BigQuery, but without a registered extension type, so such use case should not be impacted.
   
   This test is for parquet roundtrip, but a similar change was done for plain IPC roundtrip as well? (that currently also preserves the field metadata, even when the type was recognized)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org