You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Øyvind Strømmen <os...@udp.no> on 2020/07/17 12:22:02 UTC
Exception thrown by AvroParquetWriter#write causes all subsequent
calls to it to fail
Hi,
Please see code below that reproduces the scenario:
Schema schema = new Schema.Parser().parse("""
{
"type": "record",
"name": "person",
"fields": [
{
"name": "address",
"type": [
"null",
{
"type": "array",
"items": "string"
}
],
"default": null
}
]
}
"""
);
ParquetWriter<GenericRecord> writer =
AvroParquetWriter.<GenericRecord>builder(new
org.apache.hadoop.fs.Path("/tmp/person.parquet"))
.withSchema(schema)
.build();
try {
writer.write(new GenericRecordBuilder(schema).set("address",
Arrays.asList("first", null, "last")).build());
} catch (Exception e) {
e.printStackTrace();
}
try {
writer.write(new GenericRecordBuilder(schema).set("address",
Collections.singletonList("first")).build());
} catch (Exception e) {
e.printStackTrace();
}
The first call to AvroParquetWriter#write attempts to add an array with a
null element and fails - as expected - with "java.lang.NullPointerException:
Array contains a null element at 1". However, at this point all subsequent
calls (with valid records) to AvroParquetWriter#write will fail with
"org.apache.parquet.io.InvalidRecordException:
1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer isn't
being reset between writes.
Is this the indented behavior of the writer? And if so, does one have to
create a new writer whenever a write fails?
Best Regards,
Øyvind Strømmen
Re: Exception thrown by AvroParquetWriter#write causes all subsequent
calls to it to fail
Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
I've added an answer to the Ticket:
https://issues.apache.org/jira/browse/PARQUET-1887
And created a PR for who's interested:
https://github.com/apache/parquet-mr/pull/804
Cheers, Fokko
Op wo 22 jul. 2020 om 12:29 schreef Driesprong, Fokko <fokko@driesprong.frl
>:
> Thanks for reaching out Øyvind.
>
> Which version of Parquet are you using? Would it be possible to open a
> ticket, and attach the person.parquet file to? I don't see anything weird
> in your schema or code.
>
> Cheers, Fokko
>
>
>
>
> Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <os...@udp.no>:
>
>> Hi,
>>
>> Please see code below that reproduces the scenario:
>>
>> Schema schema = new Schema.Parser().parse("""
>> {
>> "type": "record",
>> "name": "person",
>> "fields": [
>> {
>> "name": "address",
>> "type": [
>> "null",
>> {
>> "type": "array",
>> "items": "string"
>> }
>> ],
>> "default": null
>> }
>> ]
>> }
>> """
>> );
>>
>> ParquetWriter<GenericRecord> writer =
>> AvroParquetWriter.<GenericRecord>builder(new
>> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>> .withSchema(schema)
>> .build();
>>
>> try {
>> writer.write(new GenericRecordBuilder(schema).set("address",
>> Arrays.asList("first", null, "last")).build());
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>>
>> try {
>> writer.write(new GenericRecordBuilder(schema).set("address",
>> Collections.singletonList("first")).build());
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>>
>>
>> The first call to AvroParquetWriter#write attempts to add an array with a
>> null element and fails - as expected - with
>> "java.lang.NullPointerException:
>> Array contains a null element at 1". However, at this point all subsequent
>> calls (with valid records) to AvroParquetWriter#write will fail with
>> "org.apache.parquet.io.InvalidRecordException:
>> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
>> isn't
>> being reset between writes.
>>
>> Is this the indented behavior of the writer? And if so, does one have to
>> create a new writer whenever a write fails?
>>
>> Best Regards,
>> Øyvind Strømmen
>>
>
Re: Exception thrown by AvroParquetWriter#write causes all subsequent
calls to it to fail
Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Thanks for reaching out Øyvind.
Which version of Parquet are you using? Would it be possible to open a
ticket, and attach the person.parquet file to? I don't see anything weird
in your schema or code.
Cheers, Fokko
Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <os...@udp.no>:
> Hi,
>
> Please see code below that reproduces the scenario:
>
> Schema schema = new Schema.Parser().parse("""
> {
> "type": "record",
> "name": "person",
> "fields": [
> {
> "name": "address",
> "type": [
> "null",
> {
> "type": "array",
> "items": "string"
> }
> ],
> "default": null
> }
> ]
> }
> """
> );
>
> ParquetWriter<GenericRecord> writer =
> AvroParquetWriter.<GenericRecord>builder(new
> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
> .withSchema(schema)
> .build();
>
> try {
> writer.write(new GenericRecordBuilder(schema).set("address",
> Arrays.asList("first", null, "last")).build());
> } catch (Exception e) {
> e.printStackTrace();
> }
>
> try {
> writer.write(new GenericRecordBuilder(schema).set("address",
> Collections.singletonList("first")).build());
> } catch (Exception e) {
> e.printStackTrace();
> }
>
>
> The first call to AvroParquetWriter#write attempts to add an array with a
> null element and fails - as expected - with
> "java.lang.NullPointerException:
> Array contains a null element at 1". However, at this point all subsequent
> calls (with valid records) to AvroParquetWriter#write will fail with
> "org.apache.parquet.io.InvalidRecordException:
> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
> isn't
> being reset between writes.
>
> Is this the indented behavior of the writer? And if so, does one have to
> create a new writer whenever a write fails?
>
> Best Regards,
> Øyvind Strømmen
>