You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Andreas Hailu (Jira)" <ji...@apache.org> on 2021/05/15 01:35:00 UTC

[jira] [Created] (PARQUET-2051) AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation

Andreas Hailu created PARQUET-2051:
--------------------------------------

             Summary: AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation
                 Key: PARQUET-2051
                 URL: https://issues.apache.org/jira/browse/PARQUET-2051
             Project: Parquet
          Issue Type: Bug
            Reporter: Andreas Hailu


Because of this, we're unable to fully leverage the ThreeLevelListWriter functionality when trying to write Avro lists out using Parquet.

During testing, we see the following exception:
{quote}{{{{{{Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:250)
        at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)
        at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)
        at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)
        at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)
        at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
        at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)}}}}}}
{quote}
Upon review, it was found that the configuration option that was set in `AvroWriteSupport` for the ThreeLevelListWriter, `parquet.avro.write-old-list-structure` being set to false, was never shared with the `AvroSchemaConverter`.

Once we made this change and tested locally, we observe the record with nulls in the array being successfully written by `AvroParquetOutputFormat`. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)