You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Xuri Nagarin <se...@gmail.com> on 2013/10/08 19:52:08 UTC
Modifying Grep to read Sequence/Snappy files
Hi,
I am trying to get the Grep example bundled with CDH to read
Sequence/Snappy files.
By default, the program throws errors trying to read Sequence/Snappy files:
java.io.EOFException: Unexpected end of block in input stream
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
at java.io.InputStream.read(InputStream.java:82)
So I edited the code to read Sequence files.
Changed:
FileInputFormat.setInputPaths(grepJob, args[0]);
To:
FileInputFormat.setInputPaths(grepJob, args[0]);
grepJob.setInputFormatClass(SequenceFileAsTextInputFormat.class);
But I still get the same error.
1) Do I need to manually set the input compression codec? I thought the
SequenceFile reader automatically detects compression.
2) If I need to manually set compression, do I do it using the
"setInputFormatClass" or is it something I set in the "conf" object?
TIA,
Xuri