You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Øyvind Strømmen (Jira)" <ji...@apache.org> on 2020/07/22 13:07:00 UTC
[jira] [Created] (PARQUET-1887) Exception thrown by
AvroParquetWriter#write causes all subsequent calls to it to fail
Øyvind Strømmen created PARQUET-1887:
----------------------------------------
Summary: Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail
Key: PARQUET-1887
URL: https://issues.apache.org/jira/browse/PARQUET-1887
Project: Parquet
Issue Type: Bug
Components: parquet-avro
Affects Versions: 1.8.3, 1.11.0
Reporter: Øyvind Strømmen
Attachments: person1_11_0.parquet, person1_8_3.parquet
Please see sample code below:
{code:java}
Schema schema = new Schema.Parser().parse("""
{
"type": "record",
"name": "person",
"fields": [
{
"name": "address",
"type": [
"null",
{
"type": "array",
"items": "string"
}
],
"default": null
}
]
}
"""
);
ParquetWriter<GenericRecord> writer = AvroParquetWriter.<GenericRecord>builder(new org.apache.hadoop.fs.Path("/tmp/person.parquet"))
.withSchema(schema)
.build();
try {
// To trigger exception, add array with null element.
writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("first", null, "last")).build());
} catch (Exception e) {
e.printStackTrace(); // "java.lang.NullPointerException: Array contains a null element at 1"
}
try {
// At this point all future calls to writer.write will fail
writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("foo", "bar")).build());
} catch (Exception e) {
e.printStackTrace(); // "org.apache.parquet.io.InvalidRecordException: 1(r) > 0 ( schema r)"
}
writer.close();
{code}
It seems to me this is caused by state not being reset between writes. Is this the indented behavior of the writer? And if so, does one have to create a new writer whenever a write fails?
I'm able to reproduce this using both parquet 1.8.3 and 1.11.0, and have attached a sample parquet file for each version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)