You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by bharath v <bh...@gmail.com> on 2010/01/03 05:04:22 UTC

Small doubt in MR

Hi,

I want a particular "section of code" to run only in any "ONE" of the
mappers . So I employed the following procedure.

Main-Class
{

        public boolean flag = true;

        Map-Class
       {
             if(flag)
            {

                flag=false;
               /* section of code */
            }


}

I am running this code on in pseudo-distributed mode and its working fine .
I doubt whether this runs correctly in distributed mode because , mappers on
other systems have to notified of the changed "flag" .. Any Comments ? If
this is wrong , any suggestions on what method I must follow to achieve this
functionality in D-mode .

Thanks

Re: Small doubt in MR

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
 From top of my head, you could set the flag to true based on some 
globally unique condition.
Like some specific file name with start offset 0 - like part-00000, 
offset 0 (the actual file name could be a jobconf param).


Note that the condition should be repeatable - since tasks can get 
reexecuted.


Regards,
Mridul



bharath v wrote:
> Hi,
> 
> I want a particular "section of code" to run only in any "ONE" of the
> mappers . So I employed the following procedure.
> 
> Main-Class
> {
> 
>         public boolean flag = true;
> 
>         Map-Class
>        {
>              if(flag)
>             {
> 
>                 flag=false;
>                /* section of code */
>             }
> 
> 
> }
> 
> I am running this code on in pseudo-distributed mode and its working fine .
> I doubt whether this runs correctly in distributed mode because , mappers on
> other systems have to notified of the changed "flag" .. Any Comments ? If
> this is wrong , any suggestions on what method I must follow to achieve this
> functionality in D-mode .
> 
> Thanks


Re: Small doubt in MR

Posted by brien colwell <xc...@gmail.com>.
Another approach would be to use a custom InputFormat implementation, 
with the flag as a property of the input split . Consider wrapping your 
InputFormat with something like 'InputFormatWithFlag', that returns 
splits that combine the  wrapped InputFormat's splits with your flag. 
Since InputFormat#getSplits is run in one process, your custom 
InputFormat can safely ensure that only one flag is set.


Brien




On 1/2/2010 11:08 PM, Mark Kerzner wrote:
> I think you need some kind of semaphore that you can turn on by the first
> reducer. For example, allocating a file in HDFS would work - if you could
> guarantee that it is an atomic operation (create-if-does-not-exist).
>
> Mark
>
> On Sat, Jan 2, 2010 at 10:04 PM, bharath v<
> bharathvissapragada1990@gmail.com>  wrote:
>
>    
>> Hi,
>>
>> I want a particular "section of code" to run only in any "ONE" of the
>> mappers . So I employed the following procedure.
>>
>> Main-Class
>> {
>>
>>         public boolean flag = true;
>>
>>         Map-Class
>>        {
>>              if(flag)
>>             {
>>
>>                 flag=false;
>>                /* section of code */
>>             }
>>
>>
>> }
>>
>> I am running this code on in pseudo-distributed mode and its working fine .
>> I doubt whether this runs correctly in distributed mode because , mappers
>> on
>> other systems have to notified of the changed "flag" .. Any Comments ? If
>> this is wrong , any suggestions on what method I must follow to achieve
>> this
>> functionality in D-mode .
>>
>> Thanks
>>
>>      
>    


Re: Small doubt in MR

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.
If you want the code to happen on only one machine, why not run it in your driver program that submits the MapReduce job?

You could also create a special input record that tells the mapper who gets that record that it's the chosen one. However, note that that mapper may be run multiple times due hardware failures.

Matei

On Jan 2, 2010, at 11:08 PM, Mark Kerzner wrote:

> I think you need some kind of semaphore that you can turn on by the first
> reducer. For example, allocating a file in HDFS would work - if you could
> guarantee that it is an atomic operation (create-if-does-not-exist).
> 
> Mark
> 
> On Sat, Jan 2, 2010 at 10:04 PM, bharath v <
> bharathvissapragada1990@gmail.com> wrote:
> 
>> Hi,
>> 
>> I want a particular "section of code" to run only in any "ONE" of the
>> mappers . So I employed the following procedure.
>> 
>> Main-Class
>> {
>> 
>>       public boolean flag = true;
>> 
>>       Map-Class
>>      {
>>            if(flag)
>>           {
>> 
>>               flag=false;
>>              /* section of code */
>>           }
>> 
>> 
>> }
>> 
>> I am running this code on in pseudo-distributed mode and its working fine .
>> I doubt whether this runs correctly in distributed mode because , mappers
>> on
>> other systems have to notified of the changed "flag" .. Any Comments ? If
>> this is wrong , any suggestions on what method I must follow to achieve
>> this
>> functionality in D-mode .
>> 
>> Thanks
>> 


Re: Small doubt in MR

Posted by Mark Kerzner <ma...@gmail.com>.
I think you need some kind of semaphore that you can turn on by the first
reducer. For example, allocating a file in HDFS would work - if you could
guarantee that it is an atomic operation (create-if-does-not-exist).

Mark

On Sat, Jan 2, 2010 at 10:04 PM, bharath v <
bharathvissapragada1990@gmail.com> wrote:

> Hi,
>
> I want a particular "section of code" to run only in any "ONE" of the
> mappers . So I employed the following procedure.
>
> Main-Class
> {
>
>        public boolean flag = true;
>
>        Map-Class
>       {
>             if(flag)
>            {
>
>                flag=false;
>               /* section of code */
>            }
>
>
> }
>
> I am running this code on in pseudo-distributed mode and its working fine .
> I doubt whether this runs correctly in distributed mode because , mappers
> on
> other systems have to notified of the changed "flag" .. Any Comments ? If
> this is wrong , any suggestions on what method I must follow to achieve
> this
> functionality in D-mode .
>
> Thanks
>