You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Xuri Nagarin <se...@gmail.com> on 2013/10/08 19:52:08 UTC

Modifying Grep to read Sequence/Snappy files

Hi,

I am trying to get the Grep example bundled with CDH to read
Sequence/Snappy files.

By default, the program throws errors trying to read Sequence/Snappy files:
java.io.EOFException: Unexpected end of block in input stream
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
at java.io.InputStream.read(InputStream.java:82)


So I edited the code to read Sequence files.

Changed:
FileInputFormat.setInputPaths(grepJob, args[0]);

To:
FileInputFormat.setInputPaths(grepJob, args[0]);
grepJob.setInputFormatClass(SequenceFileAsTextInputFormat.class);

But I still get the same error.

1) Do I need to manually set the input compression codec? I thought the
SequenceFile reader automatically detects compression.
2) If I need to manually set compression, do I do it using the
"setInputFormatClass" or is it something I set in the "conf" object?

TIA,

Xuri