You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Øyvind Strømmen <os...@udp.no> on 2020/07/17 12:22:02 UTC

Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

Hi,

Please see code below that reproduces the scenario:

Schema schema = new Schema.Parser().parse("""
  {
    "type": "record",
    "name": "person",
    "fields": [
      {
        "name": "address",
        "type": [
          "null",
          {
            "type": "array",
            "items": "string"
          }
        ],
        "default": null
      }
    ]
  }
"""
);

 ParquetWriter<GenericRecord> writer =
AvroParquetWriter.<GenericRecord>builder(new
org.apache.hadoop.fs.Path("/tmp/person.parquet"))
  .withSchema(schema)
  .build();

try {
  writer.write(new GenericRecordBuilder(schema).set("address",
Arrays.asList("first", null, "last")).build());
} catch (Exception e) {
  e.printStackTrace();
}

try {
  writer.write(new GenericRecordBuilder(schema).set("address",
Collections.singletonList("first")).build());
} catch (Exception e) {
  e.printStackTrace();
}


The first call to AvroParquetWriter#write attempts to add an array with a
null element and fails - as expected - with "java.lang.NullPointerException:
Array contains a null element at 1". However, at this point all subsequent
calls (with valid records) to AvroParquetWriter#write will fail with
"org.apache.parquet.io.InvalidRecordException:
1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer isn't
being reset between writes.

Is this the indented behavior of the writer? And if so, does one have to
create a new writer whenever a write fails?

Best Regards,
Øyvind Strømmen

Re: Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
I've added an answer to the Ticket:
https://issues.apache.org/jira/browse/PARQUET-1887

And created a PR for who's interested:
https://github.com/apache/parquet-mr/pull/804

Cheers, Fokko

Op wo 22 jul. 2020 om 12:29 schreef Driesprong, Fokko <fokko@driesprong.frl
>:

> Thanks for reaching out Øyvind.
>
> Which version of Parquet are you using? Would it be possible to open a
> ticket, and attach the person.parquet file to? I don't see anything weird
> in your schema or code.
>
> Cheers, Fokko
>
>
>
>
> Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <os...@udp.no>:
>
>> Hi,
>>
>> Please see code below that reproduces the scenario:
>>
>> Schema schema = new Schema.Parser().parse("""
>>   {
>>     "type": "record",
>>     "name": "person",
>>     "fields": [
>>       {
>>         "name": "address",
>>         "type": [
>>           "null",
>>           {
>>             "type": "array",
>>             "items": "string"
>>           }
>>         ],
>>         "default": null
>>       }
>>     ]
>>   }
>> """
>> );
>>
>>  ParquetWriter<GenericRecord> writer =
>> AvroParquetWriter.<GenericRecord>builder(new
>> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>>   .withSchema(schema)
>>   .build();
>>
>> try {
>>   writer.write(new GenericRecordBuilder(schema).set("address",
>> Arrays.asList("first", null, "last")).build());
>> } catch (Exception e) {
>>   e.printStackTrace();
>> }
>>
>> try {
>>   writer.write(new GenericRecordBuilder(schema).set("address",
>> Collections.singletonList("first")).build());
>> } catch (Exception e) {
>>   e.printStackTrace();
>> }
>>
>>
>> The first call to AvroParquetWriter#write attempts to add an array with a
>> null element and fails - as expected - with
>> "java.lang.NullPointerException:
>> Array contains a null element at 1". However, at this point all subsequent
>> calls (with valid records) to AvroParquetWriter#write will fail with
>> "org.apache.parquet.io.InvalidRecordException:
>> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
>> isn't
>> being reset between writes.
>>
>> Is this the indented behavior of the writer? And if so, does one have to
>> create a new writer whenever a write fails?
>>
>> Best Regards,
>> Øyvind Strømmen
>>
>

Re: Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Thanks for reaching out Øyvind.

Which version of Parquet are you using? Would it be possible to open a
ticket, and attach the person.parquet file to? I don't see anything weird
in your schema or code.

Cheers, Fokko




Op wo 22 jul. 2020 om 00:12 schreef Øyvind Strømmen <os...@udp.no>:

> Hi,
>
> Please see code below that reproduces the scenario:
>
> Schema schema = new Schema.Parser().parse("""
>   {
>     "type": "record",
>     "name": "person",
>     "fields": [
>       {
>         "name": "address",
>         "type": [
>           "null",
>           {
>             "type": "array",
>             "items": "string"
>           }
>         ],
>         "default": null
>       }
>     ]
>   }
> """
> );
>
>  ParquetWriter<GenericRecord> writer =
> AvroParquetWriter.<GenericRecord>builder(new
> org.apache.hadoop.fs.Path("/tmp/person.parquet"))
>   .withSchema(schema)
>   .build();
>
> try {
>   writer.write(new GenericRecordBuilder(schema).set("address",
> Arrays.asList("first", null, "last")).build());
> } catch (Exception e) {
>   e.printStackTrace();
> }
>
> try {
>   writer.write(new GenericRecordBuilder(schema).set("address",
> Collections.singletonList("first")).build());
> } catch (Exception e) {
>   e.printStackTrace();
> }
>
>
> The first call to AvroParquetWriter#write attempts to add an array with a
> null element and fails - as expected - with
> "java.lang.NullPointerException:
> Array contains a null element at 1". However, at this point all subsequent
> calls (with valid records) to AvroParquetWriter#write will fail with
> "org.apache.parquet.io.InvalidRecordException:
> 1(r) > 0 ( schema r)" as apparently the state within the RecordConsumer
> isn't
> being reset between writes.
>
> Is this the indented behavior of the writer? And if so, does one have to
> create a new writer whenever a write fails?
>
> Best Regards,
> Øyvind Strømmen
>