You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Lukas Vlcek <lu...@gmail.com> on 2008/02/14 23:05:13 UTC

Can reduce output in two different output formats?

Hi,

Is it possible to have Reducer output the data into two different formats at
the same time?
For example one output in SequenceFileOutputFormat for further processing by
consequential M/R job and second output in TextOutputFormat for later human
review. Isn't this use case disqualified by the fact that there is no way
how to set two output paths in JobConf?

I think it should be possible to create output file in TextOutputFormat from
SequenceFileOutputFormat input file but this requires running of extra M/R
job.

Regards,
Lukas

-- 
http://blog.lukas-vlcek.com/

Re: Can reduce output in two different output formats?

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Feb 14, 2008, at 2:09 PM, Jason Venner wrote:

> We write a separate file in many our our mappers and or reducers.  
> We are somewhat concerned about speculative execution and what  
> happens to the output files of killed jobs, but it seems to work fine.
> We build the output files by passing in a *tag* which is used as  
> the last part of the file name
>
> Path destinationFile = new Path( conf.get( "mapred.output.dir" ),  
> tag );
>

The _blessed_ way to create tasks' side-effect files:

http://hadoop.apache.org/core/docs/r0.16.0/mapred_tutorial.html#Task 
+Side-Effect+Files

Arun

> Lukas Vlcek wrote:
>> Hi,
>>
>> Is it possible to have Reducer output the data into two different  
>> formats at
>> the same time?
>> For example one output in SequenceFileOutputFormat for further  
>> processing by
>> consequential M/R job and second output in TextOutputFormat for  
>> later human
>> review. Isn't this use case disqualified by the fact that there is  
>> no way
>> how to set two output paths in JobConf?
>>
>> I think it should be possible to create output file in  
>> TextOutputFormat from
>> SequenceFileOutputFormat input file but this requires running of  
>> extra M/R
>> job.
>>
>> Regards,
>> Lukas
>>
>>


RE: Can reduce output in two different output formats?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Well - fortunately - since the killed jobs don't close their output
files - there's no side-effect.

Or at least so it was until 0.14 when files don't show up until close. I
have heard that they now start showing up on a create - but not sure
what the exact semantics are ..

-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Thursday, February 14, 2008 2:10 PM
To: core-user@hadoop.apache.org
Subject: Re: Can reduce output in two different output formats?

We write a separate file in many our our mappers and or reducers. We are

somewhat concerned about speculative execution and what happens to the 
output files of killed jobs, but it seems to work fine.
We build the output files by passing in a *tag* which is used as the 
last part of the file name

Path destinationFile = new Path( conf.get( "mapred.output.dir" ), tag );

Lukas Vlcek wrote:
> Hi,
>
> Is it possible to have Reducer output the data into two different
formats at
> the same time?
> For example one output in SequenceFileOutputFormat for further
processing by
> consequential M/R job and second output in TextOutputFormat for later
human
> review. Isn't this use case disqualified by the fact that there is no
way
> how to set two output paths in JobConf?
>
> I think it should be possible to create output file in
TextOutputFormat from
> SequenceFileOutputFormat input file but this requires running of extra
M/R
> job.
>
> Regards,
> Lukas
>
>   

Re: Can reduce output in two different output formats?

Posted by Jason Venner <ja...@attributor.com>.
We write a separate file in many our our mappers and or reducers. We are 
somewhat concerned about speculative execution and what happens to the 
output files of killed jobs, but it seems to work fine.
We build the output files by passing in a *tag* which is used as the 
last part of the file name

Path destinationFile = new Path( conf.get( "mapred.output.dir" ), tag );

Lukas Vlcek wrote:
> Hi,
>
> Is it possible to have Reducer output the data into two different formats at
> the same time?
> For example one output in SequenceFileOutputFormat for further processing by
> consequential M/R job and second output in TextOutputFormat for later human
> review. Isn't this use case disqualified by the fact that there is no way
> how to set two output paths in JobConf?
>
> I think it should be possible to create output file in TextOutputFormat from
> SequenceFileOutputFormat input file but this requires running of extra M/R
> job.
>
> Regards,
> Lukas
>
>   

Re: Can reduce output in two different output formats?

Posted by Jason Venner <ja...@attributor.com>.
We open our files in the configure method, and close them in the close 
method of the Map or Reduce class.

Lukas Vlcek wrote:
> Hi,
>
> Is it possible to have Reducer output the data into two different formats at
> the same time?
> For example one output in SequenceFileOutputFormat for further processing by
> consequential M/R job and second output in TextOutputFormat for later human
> review. Isn't this use case disqualified by the fact that there is no way
> how to set two output paths in JobConf?
>
> I think it should be possible to create output file in TextOutputFormat from
> SequenceFileOutputFormat input file but this requires running of extra M/R
> job.
>
> Regards,
> Lukas
>
>