You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Francis.Hu" <fr...@reachjunction.com> on 2013/09/02 11:08:54 UTC

Is there any way to set Reducer to output to multi-places?

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a
reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Thanks, Binglin

 

I found the class below that can do it :).

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

 

 

 

发件人: Binglin Chang [mailto:decstery@gmail.com] 
发送时间: Monday, September 02, 2013 17:37
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

MultipleOutputFormat allows you to write multiple files in one reducer, but can't write output to HDFS and Database concurrently, but I is a good example to show how you can write a customized OutputFormat to achieve this.

Please note that for fault tolerance, a reducer may run multiple times, this may generate redundant data, hadoop handles files using FileOutputCommitter, you need to handle database case by yourself(e.g. insert record only if record doesn't exists). 

 

On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Thanks, Binglin

 

I found the class below that can do it :).

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

 

 

 

发件人: Binglin Chang [mailto:decstery@gmail.com] 
发送时间: Monday, September 02, 2013 17:37
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

MultipleOutputFormat allows you to write multiple files in one reducer, but can't write output to HDFS and Database concurrently, but I is a good example to show how you can write a customized OutputFormat to achieve this.

Please note that for fault tolerance, a reducer may run multiple times, this may generate redundant data, hadoop handles files using FileOutputCommitter, you need to handle database case by yourself(e.g. insert record only if record doesn't exists). 

 

On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Thanks, Binglin

 

I found the class below that can do it :).

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

 

 

 

发件人: Binglin Chang [mailto:decstery@gmail.com] 
发送时间: Monday, September 02, 2013 17:37
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

MultipleOutputFormat allows you to write multiple files in one reducer, but can't write output to HDFS and Database concurrently, but I is a good example to show how you can write a customized OutputFormat to achieve this.

Please note that for fault tolerance, a reducer may run multiple times, this may generate redundant data, hadoop handles files using FileOutputCommitter, you need to handle database case by yourself(e.g. insert record only if record doesn't exists). 

 

On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Thanks, Binglin

 

I found the class below that can do it :).

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

 

 

 

发件人: Binglin Chang [mailto:decstery@gmail.com] 
发送时间: Monday, September 02, 2013 17:37
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

MultipleOutputFormat allows you to write multiple files in one reducer, but can't write output to HDFS and Database concurrently, but I is a good example to show how you can write a customized OutputFormat to achieve this.

Please note that for fault tolerance, a reducer may run multiple times, this may generate redundant data, hadoop handles files using FileOutputCommitter, you need to handle database case by yourself(e.g. insert record only if record doesn't exists). 

 

On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 

 


Re: Is there any way to set Reducer to output to multi-places?

Posted by Binglin Chang <de...@gmail.com>.
MultipleOutputFormat allows you to write multiple files in one reducer, but
can't write output to HDFS and Database concurrently, but I is a good
example to show how you can write a customized OutputFormat to achieve this.
Please note that for fault tolerance, a reducer may run multiple times,
this may generate redundant data, hadoop handles files using
FileOutputCommitter, you need to handle database case by yourself(e.g.
insert record only if record doesn't exists).


On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> This might help
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> Thanks,
> Rahul
>
>
> On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:
>
>>  hi, All****
>>
>> ** **
>>
>> Is there any way to set Reducer to output to multi-places ?  For example:
>> a reducer's result can be output to HDFS and Database concurrently.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Francis.Hu****
>>
>
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Binglin Chang <de...@gmail.com>.
MultipleOutputFormat allows you to write multiple files in one reducer, but
can't write output to HDFS and Database concurrently, but I is a good
example to show how you can write a customized OutputFormat to achieve this.
Please note that for fault tolerance, a reducer may run multiple times,
this may generate redundant data, hadoop handles files using
FileOutputCommitter, you need to handle database case by yourself(e.g.
insert record only if record doesn't exists).


On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> This might help
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> Thanks,
> Rahul
>
>
> On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:
>
>>  hi, All****
>>
>> ** **
>>
>> Is there any way to set Reducer to output to multi-places ?  For example:
>> a reducer's result can be output to HDFS and Database concurrently.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Francis.Hu****
>>
>
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Binglin Chang <de...@gmail.com>.
MultipleOutputFormat allows you to write multiple files in one reducer, but
can't write output to HDFS and Database concurrently, but I is a good
example to show how you can write a customized OutputFormat to achieve this.
Please note that for fault tolerance, a reducer may run multiple times,
this may generate redundant data, hadoop handles files using
FileOutputCommitter, you need to handle database case by yourself(e.g.
insert record only if record doesn't exists).


On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> This might help
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> Thanks,
> Rahul
>
>
> On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:
>
>>  hi, All****
>>
>> ** **
>>
>> Is there any way to set Reducer to output to multi-places ?  For example:
>> a reducer's result can be output to HDFS and Database concurrently.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Francis.Hu****
>>
>
>

答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Hi, Rahul

 

I found the class MultipleOutputs which has to do with what I want .  It can make Reducers to write to additional outputs(File and Database in my Env.) other than the job default output. 

 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

The class comments says below that I want to implement in my project:

 

* <p> 

 * Case one: writing to additional outputs other than the job default output.

 *

 * Each additional output, or named output, may be configured with its own

 * <code>OutputFormat</code>, with its own key class and with its own value

 * class.

 * </p>

 

Anyway, thanks! Rahul. Your link leads me to the class MultipleOutputs :).

 

 

 

Thanks,

Francis.Hu.

 

 

发件人: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
发送时间: Monday, September 02, 2013 17:12
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Hi, Rahul

 

I found the class MultipleOutputs which has to do with what I want .  It can make Reducers to write to additional outputs(File and Database in my Env.) other than the job default output. 

 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

The class comments says below that I want to implement in my project:

 

* <p> 

 * Case one: writing to additional outputs other than the job default output.

 *

 * Each additional output, or named output, may be configured with its own

 * <code>OutputFormat</code>, with its own key class and with its own value

 * class.

 * </p>

 

Anyway, thanks! Rahul. Your link leads me to the class MultipleOutputs :).

 

 

 

Thanks,

Francis.Hu.

 

 

发件人: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
发送时间: Monday, September 02, 2013 17:12
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Hi, Rahul

 

I found the class MultipleOutputs which has to do with what I want .  It can make Reducers to write to additional outputs(File and Database in my Env.) other than the job default output. 

 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

The class comments says below that I want to implement in my project:

 

* <p> 

 * Case one: writing to additional outputs other than the job default output.

 *

 * Each additional output, or named output, may be configured with its own

 * <code>OutputFormat</code>, with its own key class and with its own value

 * class.

 * </p>

 

Anyway, thanks! Rahul. Your link leads me to the class MultipleOutputs :).

 

 

 

Thanks,

Francis.Hu.

 

 

发件人: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
发送时间: Monday, September 02, 2013 17:12
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 


答复: Is there any way to set Reducer to output to multi-places?

Posted by "Francis.Hu" <fr...@reachjunction.com>.
Hi, Rahul

 

I found the class MultipleOutputs which has to do with what I want .  It can make Reducers to write to additional outputs(File and Database in my Env.) other than the job default output. 

 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html 

The class comments says below that I want to implement in my project:

 

* <p> 

 * Case one: writing to additional outputs other than the job default output.

 *

 * Each additional output, or named output, may be configured with its own

 * <code>OutputFormat</code>, with its own key class and with its own value

 * class.

 * </p>

 

Anyway, thanks! Rahul. Your link leads me to the class MultipleOutputs :).

 

 

 

Thanks,

Francis.Hu.

 

 

发件人: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
发送时间: Monday, September 02, 2013 17:12
收件人: user@hadoop.apache.org
主题: Re: Is there any way to set Reducer to output to multi-places?

 

This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul

 

On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com> wrote:

hi, All

 

Is there any way to set Reducer to output to multi-places ?  For example: a reducer's result can be output to HDFS and Database concurrently.

 

Thanks,

Francis.Hu

 


Re: Is there any way to set Reducer to output to multi-places?

Posted by Binglin Chang <de...@gmail.com>.
MultipleOutputFormat allows you to write multiple files in one reducer, but
can't write output to HDFS and Database concurrently, but I is a good
example to show how you can write a customized OutputFormat to achieve this.
Please note that for fault tolerance, a reducer may run multiple times,
this may generate redundant data, hadoop handles files using
FileOutputCommitter, you need to handle database case by yourself(e.g.
insert record only if record doesn't exists).


On Mon, Sep 2, 2013 at 5:11 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> This might help
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> Thanks,
> Rahul
>
>
> On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:
>
>>  hi, All****
>>
>> ** **
>>
>> Is there any way to set Reducer to output to multi-places ?  For example:
>> a reducer's result can be output to HDFS and Database concurrently.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Francis.Hu****
>>
>
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul


On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:

>  hi, All****
>
> ** **
>
> Is there any way to set Reducer to output to multi-places ?  For example:
> a reducer's result can be output to HDFS and Database concurrently.****
>
> ** **
>
> Thanks,****
>
> Francis.Hu****
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul


On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:

>  hi, All****
>
> ** **
>
> Is there any way to set Reducer to output to multi-places ?  For example:
> a reducer's result can be output to HDFS and Database concurrently.****
>
> ** **
>
> Thanks,****
>
> Francis.Hu****
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul


On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:

>  hi, All****
>
> ** **
>
> Is there any way to set Reducer to output to multi-places ?  For example:
> a reducer's result can be output to HDFS and Database concurrently.****
>
> ** **
>
> Thanks,****
>
> Francis.Hu****
>

Re: Is there any way to set Reducer to output to multi-places?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
This might help

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Thanks,
Rahul


On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu <fr...@reachjunction.com>wrote:

>  hi, All****
>
> ** **
>
> Is there any way to set Reducer to output to multi-places ?  For example:
> a reducer's result can be output to HDFS and Database concurrently.****
>
> ** **
>
> Thanks,****
>
> Francis.Hu****
>