You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Tristan Davolt (Jira)" <ji...@apache.org> on 2020/09/22 23:21:00 UTC

[jira] [Commented] (PARQUET-112) RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.

    [ https://issues.apache.org/jira/browse/PARQUET-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200419#comment-17200419 ] 

Tristan Davolt commented on PARQUET-112:
----------------------------------------

I am facing the same issue with Parquet 1.10.0. Data is being written using AvroParquetWriter and Snappy compression. Occasionally and randomly, one file of the many we write using the same method will throw a similar error as above when being read by any parquet reader. I have not yet found a workaround. The exception is thrown for the final value of a random column. This does not only occur with null fields. Our schema defines every field as optional.


{code:java}
java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53) at org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733) at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:568) at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705) at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30) at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:358) at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:231) at org.apache.parquet.tools.command.DumpCommand.execute(DumpCommand.java:148) at org.apache.parquet.tools.Main.main(Main.java:223)java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.{code}

> RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-112
>                 URL: https://issues.apache.org/jira/browse/PARQUET-112
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>         Environment: Java 1.7 Linux Debian
>            Reporter: Kristoffer Sjögren
>            Assignee: Reuben Kuhnert
>            Priority: Major
>
> I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet format. This works fine for a few gigabytes but blows up in the RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.
> {code}
> parquet.io.ParquetDecodingException: Can not read value at 19453 in block 0 in file hdfs://nn-ix01.se-ix.delta.prod:8020/user/stoffe/parquet/dogfight/2014/09/29/part-m-00153.snappy.parquet
> 	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
> 	at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
> 	at org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:157)
> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
> 	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
> 	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: parquet.io.ParquetDecodingException: Can't read value in column [action] BINARY at value 697332 out of 872236, 96921 out of 96921 in currentPage. repetition level: 0, definition level: 1
> 	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
> 	at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:414)
> 	at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:64)
> 	at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:69)
> 	at parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:71)
> 	at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:57)
> 	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
> 	... 13 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.
> 	at parquet.Preconditions.checkArgument(Preconditions.java:47)
> 	at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
> 	at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
> 	at parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:73)
> 	at parquet.column.impl.ColumnReaderImpl$2$7.read(ColumnReaderImpl.java:311)
> 	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
> 	... 19 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)