You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Rahul <rs...@xebia.com> on 2012/07/09 10:55:25 UTC

SeqFileReaderFactory give exception

Guys,

I have a SequenceFile with LogWritable Keys and Text as values . I am 
using SequenceFileSource with MRPipeline. But when I use MemPipeline it 
is giving back the following exception.

3503 [main] INFO  com.cloudera.crunch.io.seq.SeqFileReaderFactory  - Error reading from path: file:/home/rahul/software/crunch/sampleFile
java.io.IOException: wrong key class: org.apache.hadoop.io.ObjectWritable is not class org.apache.hadoop.io.LongWritable
     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1895)
     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1947)
     at com.cloudera.crunch.io.seq.SeqFileReaderFactory$1.hasNext(SeqFileReaderFactory.java:68)
     at com.cloudera.crunch.io.CompositePathIterable$2.hasNext(CompositePathIterable.java:81)

Now this is due to the fact that the file contains LongWritable Keys but 
it is using a NullWritable to read them. This gives error in MemPipline 
only, it works in the MRPipeline because the KeyClass is passed there 
using the MapContext of Hadoop and thus it is the correct one. I 
modified the SeqFileReaderFactory  to pass the KeyClass also but is this 
the correct way of doing so ?

regards
Rahul

Re: SeqFileReaderFactory give exception

Posted by Josh Wills <jo...@gmail.com>.
SequenceFileTableSouce will let you read it the file as a PTable,
which is probably the quickest way to get what you want.

On Mon, Jul 9, 2012 at 1:55 AM, Rahul <rs...@xebia.com> wrote:
> Guys,
>
> I have a SequenceFile with LogWritable Keys and Text as values . I am using
> SequenceFileSource with MRPipeline. But when I use MemPipeline it is giving
> back the following exception.
>
> 3503 [main] INFO  com.cloudera.crunch.io.seq.SeqFileReaderFactory  - Error
> reading from path: file:/home/rahul/software/crunch/sampleFile
> java.io.IOException: wrong key class: org.apache.hadoop.io.ObjectWritable is
> not class org.apache.hadoop.io.LongWritable
>     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1895)
>     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1947)
>     at
> com.cloudera.crunch.io.seq.SeqFileReaderFactory$1.hasNext(SeqFileReaderFactory.java:68)
>     at
> com.cloudera.crunch.io.CompositePathIterable$2.hasNext(CompositePathIterable.java:81)
>
> Now this is due to the fact that the file contains LongWritable Keys but it
> is using a NullWritable to read them. This gives error in MemPipline only,
> it works in the MRPipeline because the KeyClass is passed there using the
> MapContext of Hadoop and thus it is the correct one. I modified the
> SeqFileReaderFactory  to pass the KeyClass also but is this the correct way
> of doing so ?
>
> regards
> Rahul