You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Balajee Nagasubramaniam (Jira)" <ji...@apache.org> on 2019/09/18 18:15:00 UTC

[jira] [Created] (PARQUET-1656) Schema change results in exception - java.lang.ClassCastException

Balajee Nagasubramaniam created PARQUET-1656:
------------------------------------------------

             Summary: Schema change  results in exception - java.lang.ClassCastException
                 Key: PARQUET-1656
                 URL: https://issues.apache.org/jira/browse/PARQUET-1656
             Project: Parquet
          Issue Type: Bug
          Components: parquet-avro
    Affects Versions: 1.8.1, 1.12.0
         Environment: Hoodie/Parquet/Avro

Parquet-1.8.1

Avro-1.7.6
            Reporter: Balajee Nagasubramaniam


Following exception was seen with parquet 1.8.1 (and in parquet 1.12.0, when trying to reproduce it).

Exception in thread "main" java.lang.ClassCastException: optional binary phone_number (STRING) is not a group
at com.uber.komondor.shaded.org.apache.parquet.schema.Type.asGroupType(Type.java:250)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:232)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:78)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:536)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:486)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:141)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:141)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:95)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
at com.uber.komondor.shaded.org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
at com.uber.komondor.shaded.org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
at com.uber.komondor.shaded.org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
at com.uber.komondor.shaded.org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at com.uber.komondor.util.ParquetToAvroSchemaConverter$.convert(ParquetToAvroSchemaConverter.scala:46)
at com.uber.komondor.util.ParquetToAvroSchemaConverter$.main(ParquetToAvroSchemaConverter.scala:20)
at com.uber.komondor.util.ParquetToAvroSchemaConverter.main(ParquetToAvroSchemaConverter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


Original exception was triggered by the following schema change.
Schema Before change:
                         {
                            "default": null,
                            "name": "master_cluster",
                            "type": [
                                "null",
                                {
                                    "fields": [
                                        {
                                            "name": "uuid",
                                            "type": "string"
                                        },
                                        {
                                            "name": "namespace",
                                            "type": "string"
                                        },
                                        {
                                            "name": "version",
                                            "type": "long"
                                        }
                                    ],
                                    "name": "master_cluster",
                                    "type": "record"
                                }
                            ]
                        },

After schema change:
                        {
                            "default": null,
                            "name": "master_cluster",
                            "type": [
                                "null",
                                {
                                    "fields": [
                                        {
                                            "default": null,
                                            "name": "uuid",
                                            "type": [
                                                "null",
                                                "string"
                                            ]
                                        },
                                        {
                                            "default": null,
                                            "name": "namespace",
                                            "type": [
                                                "null",
                                                "string"
                                            ]
                                        },
                                        {
                                            "default": null,
                                            "name": "version",
                                            "type": [
                                                "null",
                                                "long"
                                            ]
                                        }
                                    ],
                                    "name": "VENUE_ORGANIZATIONmaster_cluster",
                                    "type": "record"
                                }
                            ]
                        },

We were suspecting PARQUET-1441 could be in play and tried to reproduce the issue on parquet-1.12.0 and seeing the same exception.

During the repro noticed that issue could be with avroSchema conversion (field name was substituted with generic name "array").  While we look into this further, want to get community input on whether this is a known issue and any thoughts on path forward.

19/09/12 22:34:37 DEBUG avro.SchemaCompatibility: Checking compatibility of reader {"type":"record","name":"IDENTITYphones_items","fields":[{"name":"phone_number","type":["null","string"],"default":null}]} with writer {"type":"record","name":"array","fields":[{"name":"phone_number","type":["null","string"],"default":null}]}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)