You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Øyvind Strømmen (Jira)" <ji...@apache.org> on 2020/07/22 13:07:00 UTC

[jira] [Created] (PARQUET-1887) Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

Øyvind Strømmen created PARQUET-1887:
----------------------------------------

             Summary: Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail
                 Key: PARQUET-1887
                 URL: https://issues.apache.org/jira/browse/PARQUET-1887
             Project: Parquet
          Issue Type: Bug
          Components: parquet-avro
    Affects Versions: 1.8.3, 1.11.0
            Reporter: Øyvind Strømmen
         Attachments: person1_11_0.parquet, person1_8_3.parquet

Please see sample code below:
{code:java}
Schema schema = new Schema.Parser().parse("""
        {
          "type": "record",
          "name": "person",
          "fields": [
            {
              "name": "address",
              "type": [
                "null",
                {
                  "type": "array",
                  "items": "string"
                }
              ],
              "default": null
            }
          ]
        }
        """
);

ParquetWriter<GenericRecord> writer = AvroParquetWriter.<GenericRecord>builder(new org.apache.hadoop.fs.Path("/tmp/person.parquet"))
        .withSchema(schema)
        .build();

try {
    // To trigger exception, add array with null element.
    writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("first", null, "last")).build());
} catch (Exception e) {
    e.printStackTrace(); // "java.lang.NullPointerException: Array contains a null element at 1"
}

try {
    // At this point all future calls to writer.write will fail
    writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("foo", "bar")).build());
} catch (Exception e) {
    e.printStackTrace(); // "org.apache.parquet.io.InvalidRecordException: 1(r) > 0 ( schema r)"
}

writer.close();
{code}
It seems to me this is caused by state not being reset between writes. Is this the indented behavior of the writer? And if so, does one have to create a new writer whenever a write fails?

I'm able to reproduce this using both parquet 1.8.3 and 1.11.0, and have attached a sample parquet file for each version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)