You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Joydeep Sen Sarma <js...@facebook.com> on 2007/10/26 09:29:58 UTC

problems reading compressed sequencefiles in streaming (0.13.1)

I was hoping to use -inputformat SequenceFileAsTextInputFormat to process compressed sequencefiles in streaming jobs

 

However, using a python mapper that just echoes out each line as it gets, and numreducetasks=0 - here's what the streaming job output looks like:

 

SEQ^F org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache.hadoop.io.compress.GzipCodec^@^@^@^@Z+rï¿½ï¿½ï¿½ï¿½ï¿½ï¿½^Fï¿½

 

So seems like the input file was not treated as sequencefile. 

 

I must be missing some args - except don't understand what. Help appreciated ..

 

Thx,

 

Joydeep

RE: problems reading compressed sequencefiles in streaming (0.13.1)

Posted by Joydeep Sen Sarma <js...@facebook.com>.

Thanks!

A gigantic Duh moment.

-----Original Message-----
From: Runping Qi [mailto:runping@yahoo-inc.com] 
Sent: Friday, October 26, 2007 9:59 AM
To: hadoop-user@lucene.apache.org
Subject: RE: problems reading compressed sequencefiles in streaming (0.13.1)


Try to add the package name too:

o.a.h.m. SequenceFileAsTextInputFormat

Runping


> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Friday, October 26, 2007 12:30 AM
> To: hadoop-user@lucene.apache.org
> Subject: problems reading compressed sequencefiles in streaming (0.13.1)
> 
> I was hoping to use -inputformat SequenceFileAsTextInputFormat to process
> compressed sequencefiles in streaming jobs
> 
> 
> 
> However, using a python mapper that just echoes out each line as it gets,
> and numreducetasks=0 - here's what the streaming job output looks like:
> 
> 
> 
> SEQ^F
> org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache
> .hadoop.io.compress.GzipCodec^@^@^@^@Z+rï¿½ï¿½ï¿½ï¿½ï¿½ï¿½^Fï¿½
> 
> 
> 
> So seems like the input file was not treated as sequencefile.
> 
> 
> 
> I must be missing some args - except don't understand what. Help
> appreciated ..
> 
> 
> 
> Thx,
> 
> 
> 
> Joydeep

RE: problems reading compressed sequencefiles in streaming (0.13.1)

Posted by Runping Qi <ru...@yahoo-inc.com>.

Try to add the package name too:

o.a.h.m. SequenceFileAsTextInputFormat

Runping


> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Friday, October 26, 2007 12:30 AM
> To: hadoop-user@lucene.apache.org
> Subject: problems reading compressed sequencefiles in streaming (0.13.1)
> 
> I was hoping to use -inputformat SequenceFileAsTextInputFormat to process
> compressed sequencefiles in streaming jobs
> 
> 
> 
> However, using a python mapper that just echoes out each line as it gets,
> and numreducetasks=0 - here's what the streaming job output looks like:
> 
> 
> 
> SEQ^F
> org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache
> .hadoop.io.compress.GzipCodec^@^@^@^@Z+rï¿½ï¿½ï¿½ï¿½ï¿½ï¿½^Fï¿½
> 
> 
> 
> So seems like the input file was not treated as sequencefile.
> 
> 
> 
> I must be missing some args - except don't understand what. Help
> appreciated ..
> 
> 
> 
> Thx,
> 
> 
> 
> Joydeep