You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by 晋光峰 <ji...@gmail.com> on 2008/07/28 12:12:05 UTC

How to control the map and reduce step sequentially

Dear All,

When i using Hadoop, I noticed that the reducer step is started immediately
when the mappers are still running. According to my project requirement, the
reducer step should not start until all the mappers finish their execution.
Anybody knows how to use some Hadoop API to achieve this? When all the
mappers finish their process, then the reducer is started.

Thanks
-- 
Guangfeng Jin

Re: How to control the map and reduce step sequentially

Posted by rae l <cr...@gmail.com>.

2008/7/29 Xuebing Yan <ya...@alibaba-inc.com>:
>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了，
> Hadoop 0.17的中文文档有可能在近期发布。
好。

http://www.hadoop.org.cn/
这个似乎是一个人建立的BLOG，查询结果是：

www.hadoop.org.cn >> 218.240.14.21

    * 本站主数据：北京市 中关村信息工程股份有限公司
    * 查询结果2：北京市 中关村信息工程股份有限公司
    * 查询结果3：北京市

--
程任全

Re: How to control the map and reduce step sequentially

Posted by rae l <cr...@gmail.com>.

2008/7/29 Xuebing Yan <ya...@alibaba-inc.com>:
>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了，
> Hadoop 0.17的中文文档有可能在近期发布。
好。

http://www.hadoop.org.cn/
这个似乎是一个人建立的BLOG，查询结果是：

www.hadoop.org.cn >> 218.240.14.21

    * 本站主数据：北京市 中关村信息工程股份有限公司
    * 查询结果2：北京市 中关村信息工程股份有限公司
    * 查询结果3：北京市

--
程任全

Re: How to control the map and reduce step sequentially

Posted by Daniel Yu <d4...@gmail.com>.

我现在在国外读书 我的毕业设计课题正好是用hadoop和hbase的 有一个中文社区是件挺不错的事
希望相关的文档资料都能及时跟进
2008/7/29 Xuebing Yan <ya...@alibaba-inc.com>

>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了，
> Hadoop 0.17的中文文档有可能在近期发布。
>
> －闫雪冰
>
> On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> > 2008/7/29 晋光峰 <ji...@gmail.com>:
> > > I got it. Thanks!
> > >
> > > 2008/7/28 Shengkai Zhu <ge...@gmail.com>
> > >
> > >> The real reduce logic is actually started when all map tasks are
> finished.
> > >>
> > >> Is it still unexpected?
> > >>
> > >> 朱盛凯
> > >>
> > >> Jash Zhu
> > >>
> > >> 复旦大学软件学院
> >
> 根据我使用Hadoop和看过的Hadoop代码的经验，Reducer不会在Mapper之前运行；有时能观察到mapper先启动了，但也没有对程序运行的结果有影响；
> >
> > BTW:
> 原来有这么多国内的朋友在研究Hadoop啊，我也是在几个月前根据公司的任务开始研究和部署Hadoop；照此看来，如果我们建设一个Hadoop中文讨论区不知如何？或者哪位已知有了中文的Hadoop讨论区？根据PowerBy页面国内已经有了Koubei网已经在用上了：
> > http://wiki.apache.org/hadoop/PoweredBy
> >
> > --
> > 程任全
>
>

Re: How to control the map and reduce step sequentially

Posted by Xuebing Yan <ya...@alibaba-inc.com>.

阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了，
Hadoop 0.17的中文文档有可能在近期发布。

－闫雪冰

On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> 2008/7/29 晋光峰 <ji...@gmail.com>:
> > I got it. Thanks!
> >
> > 2008/7/28 Shengkai Zhu <ge...@gmail.com>
> >
> >> The real reduce logic is actually started when all map tasks are finished.
> >>
> >> Is it still unexpected?
> >>
> >> 朱盛凯
> >>
> >> Jash Zhu
> >>
> >> 复旦大学软件学院
> 根据我使用Hadoop和看过的Hadoop代码的经验，Reducer不会在Mapper之前运行；有时能观察到mapper先启动了，但也没有对程序运行的结果有影响；
> 
> BTW: 原来有这么多国内的朋友在研究Hadoop啊，我也是在几个月前根据公司的任务开始研究和部署Hadoop；照此看来，如果我们建设一个Hadoop中文讨论区不知如何？或者哪位已知有了中文的Hadoop讨论区？根据PowerBy页面国内已经有了Koubei网已经在用上了：
> http://wiki.apache.org/hadoop/PoweredBy
> 
> --
> 程任全

Re: How to control the map and reduce step sequentially

Posted by rae l <cr...@gmail.com>.

2008/7/29 晋光峰 <ji...@gmail.com>:
> I got it. Thanks!
>
> 2008/7/28 Shengkai Zhu <ge...@gmail.com>
>
>> The real reduce logic is actually started when all map tasks are finished.
>>
>> Is it still unexpected?
>>
>> 朱盛凯
>>
>> Jash Zhu
>>
>> 复旦大学软件学院
根据我使用Hadoop和看过的Hadoop代码的经验，Reducer不会在Mapper之前运行；有时能观察到mapper先启动了，但也没有对程序运行的结果有影响；

BTW: 原来有这么多国内的朋友在研究Hadoop啊，我也是在几个月前根据公司的任务开始研究和部署Hadoop；照此看来，如果我们建设一个Hadoop中文讨论区不知如何？或者哪位已知有了中文的Hadoop讨论区？根据PowerBy页面国内已经有了Koubei网已经在用上了：
http://wiki.apache.org/hadoop/PoweredBy

--
程任全

Re: How to control the map and reduce step sequentially

Posted by 晋光峰 <ji...@gmail.com>.

I got it. Thanks!

2008/7/28 Shengkai Zhu <ge...@gmail.com>

> The real reduce logic is actually started when all map tasks are finished.
>
> Is it still unexpected?
>
>
> On 7/28/08, 晋光峰 <ji...@gmail.com> wrote:
> >
> > Dear All,
> >
> > When i using Hadoop, I noticed that the reducer step is started
> immediately
> > when the mappers are still running. According to my project requirement,
> > the
> > reducer step should not start until all the mappers finish their
> execution.
> > Anybody knows how to use some Hadoop API to achieve this? When all the
> > mappers finish their process, then the reducer is started.
> >
> > Thanks
> > --
> > Guangfeng Jin
> >
>
>
>
> --
>
> 朱盛凯
>
> Jash Zhu
>
> 复旦大学软件学院
>
> Software School, Fudan University
>



-- 
Guangfeng Jin

Re: How to control the map and reduce step sequentially

Posted by Shengkai Zhu <ge...@gmail.com>.

The real reduce logic is actually started when all map tasks are finished.

Is it still unexpected?


On 7/28/08, 晋光峰 <ji...@gmail.com> wrote:
>
> Dear All,
>
> When i using Hadoop, I noticed that the reducer step is started immediately
> when the mappers are still running. According to my project requirement,
> the
> reducer step should not start until all the mappers finish their execution.
> Anybody knows how to use some Hadoop API to achieve this? When all the
> mappers finish their process, then the reducer is started.
>
> Thanks
> --
> Guangfeng Jin
>



-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University

Re: How to control the map and reduce step sequentially

Posted by Deyaa Adranale <de...@iais.fraunhofer.de>.

As far as I know, the reducer has three tasks: fetching results of 
mappers, sorting the results, and calling the reduce function.
when some mappers finish their execution, the reducer starts by fetching 
their results to save time.
neither sorting nor calling the reduce function could start before all 
the mappers have finished and all their results are available locally.

I don't know whether you can prevent copying mappers results before all 
mappers finish. Anyway, it would be meaningless.

hope that helped

Deyaa

??? wrote:
> Dear All,
>
> When i using Hadoop, I noticed that the reducer step is started immediately
> when the mappers are still running. According to my project requirement, the
> reducer step should not start until all the mappers finish their execution.
> Anybody knows how to use some Hadoop API to achieve this? When all the
> mappers finish their process, then the reducer is started.
>
> Thanks
>