You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/28 00:25:53 UTC

[GitHub] [arrow-rs] hohav commented on issue #385: Panic when writing Parquet from non-nullable ListArray

hohav commented on issue #385:
URL: https://github.com/apache/arrow-rs/issues/385#issuecomment-869250321


   I think there may be a more fundamental issue with `ListArray`. I created a new version of my repro [here](https://github.com/hohav/arrow-parquet-list-test/tree/v2), where I create a very simple ListArray: `[[1], [], [2]]`. I can successfully write this to a Parquet file using `ArrowWriter`, but then `parquet meta` shows incorrect information:
   ```
   $ parquet meta test.parquet 
   
   File path:  test.parquet
   Created by: parquet-rs version 5.0.0-SNAPSHOT (build de62168a4f428e3c334e1cfa5c5db23272f313d7)
   Properties:
     ARROW:schema: /////7gAAAAQAAAAAAAKAA4ADAALAAQACgAAABQAAAAAAAABBAAKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAEAAAAEAAAA3P///xwAAAAMAAAAAAABDFwAAAABAAAAHAAAAAQABAAEAAAAEAAUABAADgAPAAQAAAAIABAAAAAYAAAAIAAAAAAAAQIcAAAACAAMAAQACwAIAAAAIAAAAAAAAAEAAAAABAAAAGl0ZW0AAAAABgAAAHZhbHVlcwAA
   Schema:
   message arrow_schema {
     optional group values (LIST) {
       repeated group list {
         optional int32 item;
       }
     }
   }
   
   
   Row group 0:  count: 3  23.67 B records  start: 4  total: 71 B
   --------------------------------------------------------------------------------
                     type      encodings count     avg size   nulls   min / max
   values.list.item  INT32     _ RR_     3         23.67 B    1       "1" / "2"
   ```
   Notice `nulls 1`, which AFAICT is incorrect: there are no null items, only one empty list. And `parquet cat` fails entirely:
   ```
   $ parquet cat test.parquet 
   Unknown error
   java.lang.RuntimeException: Failed on record 0
   	at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
   	at org.apache.parquet.cli.Main.run(Main.java:155)
   	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
   	at org.apache.parquet.cli.Main.main(Main.java:185)
   Caused by: java.lang.ClassCastException: optional int32 item is not a group
   	at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
   	at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
   	at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
   	at org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
   	at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
   	at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
   	at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
   	at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
   	at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
   	at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
   	at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
   	at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
   	at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
   	at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
   	... 3 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org