You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by pratik khadloya <ti...@gmail.com> on 2014/09/02 21:50:49 UTC

Re: Issue with reading parquet file exported by sqoop

Yes, they seem to be valid. Thank you Xu.
I will be validating the data in 35 tables next week, will report back when
i have the results.

Regards,
Pratik


On Sun, Aug 31, 2014 at 8:26 AM, Xu, Qian A <qi...@intel.com> wrote:

>  Hi Pratik,
>
>
>
> If reopen the file reader can solve the problem, can I come to a
> conclusion that the exported Parquet files are valid?
>
>
>
> Best regards
>
> --Qian Xu (Stanley)
>
>
>
>
>
>
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Friday, August 29, 2014 3:46 AM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Issue with reading parquet file exported by sqoop
>
>
>
> Strangely enough another version of my reader works
> https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad
>
> The difference is that i have to re-open the file again when i read a new
> column.
>
> The reopening happens through the following line:
>
> ParquetFileReader fileReader = new ParquetFileReader(conf, filePath,
> blocks, schema.getColumns());
>
>
>
> which i am calling in a loop where i am looping over column descriptors.
>
>
>
>
>
> ~Pratik
>
>
>
> On Thu, Aug 28, 2014 at 11:49 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>
> This issue only occurs for some columns and that too after reading a few
> thousand records.
>
>
>
> ~Pratik
>
>
>
> On Thu, Aug 28, 2014 at 11:48 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>
> Hello,
>
>
>
> I am facing the following exception when reading a parquet file exported
> by sqoop.
>
> My parquet column reader code is at
> https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf
>
>
>
> Exception in thread "main" parquet.io.ParquetDecodingException: Can't read
> value in column [description] BINARY at value 44899 out of 57096, 44899 out
> of 57096 in currentPage. repetition level: 0, definition level: 1
>
> at
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450)
>
> at
> parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:398)
>
> at
> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.load(RfiParquetFileReader.java:147)
>
> at
> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.<init>(RfiParquetFileReader.java:87)
>
> at
> com.rocketfuel.grid.lookup_new.RfiParquetFileReader.main(RfiParquetFileReader.java:114)
>
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking
> stream.
>
> at parquet.Preconditions.checkArgument(Preconditions.java:47)
>
> at
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>
> at
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>
> at
> parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:82)
>
> at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:295)
>
> at
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:446)
>
> ... 4 more
>
>
>
>
>
> Does anyone know what this could be related to? What i could be doing
> wrong?
>
>
>
>
>
> Thanks,
>
> ~Pratik
>
>
>
>
>