You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Andreas Hailu (Jira)" <ji...@apache.org> on 2022/06/16 12:27:00 UTC

[jira] [Updated] (PARQUET-2051) AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation

     [ https://issues.apache.org/jira/browse/PARQUET-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Hailu updated PARQUET-2051:
-----------------------------------
    Fix Version/s: 1.12.3

> AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation
> -------------------------------------------------------------------------------
>
>                 Key: PARQUET-2051
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2051
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Andreas Hailu
>            Assignee: Andreas Hailu
>            Priority: Major
>             Fix For: 1.12.3
>
>
> Because of this, we're unable to fully leverage the ThreeLevelListWriter functionality when trying to write Avro lists out using Parquet through the AvroParquetOutputFormat.
> The following record is used for testing:
>  Schema:
> { "type": "record", "name": "NullLists", "namespace": "com.test", "fields": [ \{ "name": "KeyID", "type": "string" }, \{ "name": "NullableList", "type": [ "null", { "type": "array", "items": [ "null", "string" ] } ], "default": null } ] }
> Record (using basic JSON just for display purposes):
> { "KeyID": "0", "NullableList": [ "foo", null, "baz" ] }
> During testing, we see the following exception:
> {quote}{{Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group}}
>  \{{ at org.apache.parquet.schema.Type.asGroupType(Type.java:250)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)}}
>  \{{ at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)}}
>  \{{ at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128}}
> {quote}
> Upon review, it was found that the configuration option that was set in AvroWriteSupport for the ThreeLevelListWriter, parquet.avro.write-old-list-structure being set to false, was never shared with the AvroSchemaConverter.
> Once we made this change and tested locally, we observe the record with nulls in the array being successfully written by AvroParquetOutputFormat. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)