You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Ravuri, Venkata Puneet" <vr...@ea.com> on 2014/08/25 03:13:14 UTC

Hive 0.13 count(*) query issue for S3 data storage

Hello,

I am using Hadoop 2.5 and Hive 0.13 setup.
I have an external partitioned Hive table with files stored in S3 in RCFile format.
When I perform a 'select *', I get the rows correctly but aggregation queries are failing with the following exception:-

Caused by: java.io.EOFException: Attempted to seek or read past the end of the file
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:601)
                    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
                    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
                    at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
                    at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
                    at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
                    at org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
                    at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
                    at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
                    at org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
                    at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
                    at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
                    at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
                    at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
                    ... 15 more

The same issue used to happen for Hive 0.12, but disabling column pruning by setting the property 'hive.optimize.cp' to false resolved this issue.
For Hive 0.13 this property was removed (HIVE-4113<https://issues.apache.org/jira/browse/HIVE-4113>).
Is there any configuration that needs to be changed for accessing RCFiles from S3 through Hive?


Thanks and Regards,
Puneet