You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/04/16 20:15:38 UTC
Working with my gzipped sequence file
at org.apache.hadoop.mapred.
SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
... 21 more
The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?
1) can I just set the split size VERY VERY high thus causing hive never to
split this files? My files were produced from a map reduce program so they
are already split very small. I really do not want to have to force a change
upstream.
2) From the other post the key/value of the sequence file should be
ByteWritable Text. Currently my key/values are text/text. and my data is the
the Key...so
I have already written my own SequenceRecordReader, but it is not working.
but I am swapping the key and the value. So I am thinking:
1. For key emit a dummy ByteWritable maybe 'A'
2. Write the key to the value
Will this work? Are their other gotcha's here?
Thank you,
Edward
Re: Working with my gzipped sequence file
Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Apr 16, 2010 at 2:15 PM, Edward Capriolo <ed...@gmail.com>wrote:
> at org.apache.hadoop.mapred.
>
> SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
> ... 21 more
>
>
> The compression being used here - gzip - is not suitable for splitting of
> the input files. That could be the reason why you are seeing this exception.
> Can you try using a different compression scheme such as bzip2, or perhaps
> by not compressing the files at all?
>
>
> 1) can I just set the split size VERY VERY high thus causing hive never to
> split this files? My files were produced from a map reduce program so they
> are already split very small. I really do not want to have to force a change
> upstream.
>
> 2) From the other post the key/value of the sequence file should be
> ByteWritable Text. Currently my key/values are text/text. and my data is the
> the Key...so
>
> I have already written my own SequenceRecordReader, but it is not working.
> but I am swapping the key and the value. So I am thinking:
>
> 1. For key emit a dummy ByteWritable maybe 'A'
> 2. Write the key to the value
>
> Will this work? Are their other gotcha's here?
>
> Thank you,
> Edward
>
FYI the problem here is that hadoop NEEDS the native libraries to work with
GZIP block sequence compressed files. For whatever reason the dfs -text tool
can open then but mapreduce can't. Upstream should report error like!
Tring to load native libs...
cant do it falling back to...
Should be replaced with:
trying to load native libs...
FALLING BACK TO JAVA LIBS THAT WONT WORK ANYWAY!!!