You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2015/07/09 19:34:05 UTC
[jira] [Commented] (PARQUET-246) ArrayIndexOutOfBoundsException
with Parquet write version v2
[ https://issues.apache.org/jira/browse/PARQUET-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620902#comment-14620902 ]
Ryan Blue commented on PARQUET-246:
-----------------------------------
Merged [#235|https://github.com/apache/parquet-mr/pull/235] to read data written with this bug. This will fail MR jobs unless "parquet.split.files" is set to false because data must be read sequentially from the start of the file.
> ArrayIndexOutOfBoundsException with Parquet write version v2
> ------------------------------------------------------------
>
> Key: PARQUET-246
> URL: https://issues.apache.org/jira/browse/PARQUET-246
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Konstantin Shaposhnikov
> Fix For: 1.8.0
>
>
> I am getting the following exception when reading a parquet file that was created using Avro WriteSupport and Parquet write version v2.0:
> {noformat}
> Caused by: parquet.io.ParquetDecodingException: Can't read value in column [colName, rows, array, name] BINARY at value 313601 out of 428260, 1 out of 39200 in currentPage. repetition level: 0, definition level: 2
> at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
> at parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:364)
> at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
> ... 27 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException
> at parquet.column.values.deltastrings.DeltaByteArrayReader.readBytes(DeltaByteArrayReader.java:70)
> at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:307)
> at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:458)
> ... 30 more
> {noformat}
> The file is quite big (500Mb) so I cannot upload it here, but possibly there is enough information in the exception message to understand the cause of error.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)