You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "J Y (Jira)" <ji...@apache.org> on 2022/08/31 05:38:00 UTC

[jira] [Created] (PARQUET-2181) parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them

J Y created PARQUET-2181:
----------------------------

             Summary: parquet-cli fails at supporting parquet-protobuf generated schemas that have repeated primitives in them
                 Key: PARQUET-2181
                 URL: https://issues.apache.org/jira/browse/PARQUET-2181
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cli
            Reporter: J Y


i generated a parquet file using a protobuf with this proto definition:

{quote}message IndexPath {
  // Index of item in path.
  repeated int32 index = 1;
}

message SomeEvent {
  // truncated/obfuscated wrapper
  optional IndexPath client_position = 1;
}
{quote}

this gets translated to the following parquet schema using the new compliant schema for lists:

{quote}message SomeEvent {
  optional group client_position = 24 {
    optional group index (LIST) = 1 {
      repeated group list {
        required int32 element;
      }
    }
  }
}{quote}

this causes parquet-cli cat to barf on a file containing these events:

{quote}java.lang.RuntimeException: Failed on record 0
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
        at org.apache.parquet.cli.Main.run(Main.java:157)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:187)
Caused by: java.lang.ClassCastException: required int32 element is not a group
        at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
        at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
        at org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
        at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
        at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
        at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
        at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
        at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
        at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
        at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
        at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
        at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
        at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
        at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
        at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
        at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
        at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
        ... 3 more{quote}

using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)