You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Kristoffer Sjögren (JIRA)" <ji...@apache.org> on 2014/10/02 09:46:34 UTC

[jira] [Commented] (PARQUET-112) RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.

    [ https://issues.apache.org/jira/browse/PARQUET-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156190#comment-14156190 ] 

Kristoffer Sjögren commented on PARQUET-112:
--------------------------------------------

I should add that data is written using AvroParquetFileTarget and SNAPPY compression. Data is read using AvroParquetFileSource with UnboundRecordFilter and includeField.

> RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-112
>                 URL: https://issues.apache.org/jira/browse/PARQUET-112
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>         Environment: Java 1.7 Linux Debian
>            Reporter: Kristoffer Sjögren
>
> I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet format. This works fine for a few gigabytes but blows up in the RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.
> parquet.io.ParquetDecodingException: Can not read value at 19453 in block 0 in file hdfs://nn-ix01.se-ix.delta.prod:8020/user/stoffe/parquet/dogfight/2014/09/29/part-m-00153.snappy.parquet
> 	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
> 	at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
> 	at org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:157)
> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
> 	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
> 	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: parquet.io.ParquetDecodingException: Can't read value in column [action] BINARY at value 697332 out of 872236, 96921 out of 96921 in currentPage. repetition level: 0, definition level: 1
> 	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
> 	at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:414)
> 	at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:64)
> 	at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:69)
> 	at parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:71)
> 	at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:57)
> 	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
> 	... 13 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.
> 	at parquet.Preconditions.checkArgument(Preconditions.java:47)
> 	at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
> 	at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
> 	at parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:73)
> 	at parquet.column.impl.ColumnReaderImpl$2$7.read(ColumnReaderImpl.java:311)
> 	at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
> 	... 19 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)