You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by "Y. Dong" <tq...@gmail.com> on 2013/08/20 12:06:55 UTC

MapReduce code location

Hi All, 

I'm a Mapreduce newbie, what I want to know is that,  say I have a mapper class:

public Class Map implements Mapper {
	
	public List A;
	public static List B; 

	public Map(){	//class constructor
		System.out.println("Im initializing");
	}

	@Override
	protected void map(………){
		System.out.println("Im inside a mapper");
		…….
	}

}

when I run this mapper on a multi-machine hadoop configuration, will hadoop instantiate
multiple instances of this class then transmit them to every remote machine?  So in a remote
machine will the map(…) method be able to access List A and List B locally from its own memory? 
If yes, in the map method, what if I run System.out.println, will printed message be only shown on 
the remote machine but not the machine I start the whole map reduce job?

Thanks. 

Eason

Re: MapReduce code location

Posted by Kun Ling <lk...@gmail.com>.

Hi Y. Dong,
    Here is for your questions:

1. will hadoop instantiate multiple instances of this class then transmit
them to every remote machine?
ANSWER:  Each of the  TaskTracker in Hadoop  cluster will create an
instance of your Map class, and the transmission of the data is
accomplished by other part of the framework in Hadoop cluster.

    since each TaskTracker  starts a JVM, which will create an object of
your Map class,and will feed  key-value pairs of your input data  to your
map method.  And the shuffle phase will pass the Map output data to Reduce
method.

2.  in a remote machine will the map(…) method be able to access List A and
List B locally from its own memory?

ANSWER: because each TaskTracker node have its only Map object,  they have
List A and List B in their local memory only.



Hoping the above answer helps you.


yours,

Kun Ling



On Tue, Aug 20, 2013 at 6:06 PM, Y. Dong <tq...@gmail.com> wrote:

> Hi All,
>
> I'm a Mapreduce newbie, what I want to know is that,  say I have a mapper
> class:
>
> public Class Map implements Mapper {
>
>         public List A;
>         public static List B;
>
>         public Map(){   //class constructor
>                 System.out.println("Im initializing");
>         }
>
>         @Override
>         protected void map(………){
>                 System.out.println("Im inside a mapper");
>                 …….
>         }
>
> }
>
> when I run this mapper on a multi-machine hadoop configuration, will
> hadoop instantiate
> multiple instances of this class then transmit them to every remote
> machine?  So in a remote
> machine will the map(…) method be able to access List A and List B locally
> from its own memory?
> If yes, in the map method, what if I run System.out.println, will printed
> message be only shown on
> the remote machine but not the machine I start the whole map reduce job?
>
> Thanks.
>
> Eason




-- 
http://www.lingcc.com

Re: MapReduce code location

Posted by Kun Ling <lk...@gmail.com>.

Hi Y. Dong,
    Here is for your questions:

1. will hadoop instantiate multiple instances of this class then transmit
them to every remote machine?
ANSWER:  Each of the  TaskTracker in Hadoop  cluster will create an
instance of your Map class, and the transmission of the data is
accomplished by other part of the framework in Hadoop cluster.

    since each TaskTracker  starts a JVM, which will create an object of
your Map class,and will feed  key-value pairs of your input data  to your
map method.  And the shuffle phase will pass the Map output data to Reduce
method.

2.  in a remote machine will the map(…) method be able to access List A and
List B locally from its own memory?

ANSWER: because each TaskTracker node have its only Map object,  they have
List A and List B in their local memory only.



Hoping the above answer helps you.


yours,

Kun Ling



On Tue, Aug 20, 2013 at 6:06 PM, Y. Dong <tq...@gmail.com> wrote:

> Hi All,
>
> I'm a Mapreduce newbie, what I want to know is that,  say I have a mapper
> class:
>
> public Class Map implements Mapper {
>
>         public List A;
>         public static List B;
>
>         public Map(){   //class constructor
>                 System.out.println("Im initializing");
>         }
>
>         @Override
>         protected void map(………){
>                 System.out.println("Im inside a mapper");
>                 …….
>         }
>
> }
>
> when I run this mapper on a multi-machine hadoop configuration, will
> hadoop instantiate
> multiple instances of this class then transmit them to every remote
> machine?  So in a remote
> machine will the map(…) method be able to access List A and List B locally
> from its own memory?
> If yes, in the map method, what if I run System.out.println, will printed
> message be only shown on
> the remote machine but not the machine I start the whole map reduce job?
>
> Thanks.
>
> Eason




-- 
http://www.lingcc.com

Re: MapReduce code location

Posted by Kun Ling <lk...@gmail.com>.

Hi Y. Dong,
    Here is for your questions:

1. will hadoop instantiate multiple instances of this class then transmit
them to every remote machine?
ANSWER:  Each of the  TaskTracker in Hadoop  cluster will create an
instance of your Map class, and the transmission of the data is
accomplished by other part of the framework in Hadoop cluster.

    since each TaskTracker  starts a JVM, which will create an object of
your Map class,and will feed  key-value pairs of your input data  to your
map method.  And the shuffle phase will pass the Map output data to Reduce
method.

2.  in a remote machine will the map(…) method be able to access List A and
List B locally from its own memory?

ANSWER: because each TaskTracker node have its only Map object,  they have
List A and List B in their local memory only.



Hoping the above answer helps you.


yours,

Kun Ling



On Tue, Aug 20, 2013 at 6:06 PM, Y. Dong <tq...@gmail.com> wrote:

> Hi All,
>
> I'm a Mapreduce newbie, what I want to know is that,  say I have a mapper
> class:
>
> public Class Map implements Mapper {
>
>         public List A;
>         public static List B;
>
>         public Map(){   //class constructor
>                 System.out.println("Im initializing");
>         }
>
>         @Override
>         protected void map(………){
>                 System.out.println("Im inside a mapper");
>                 …….
>         }
>
> }
>
> when I run this mapper on a multi-machine hadoop configuration, will
> hadoop instantiate
> multiple instances of this class then transmit them to every remote
> machine?  So in a remote
> machine will the map(…) method be able to access List A and List B locally
> from its own memory?
> If yes, in the map method, what if I run System.out.println, will printed
> message be only shown on
> the remote machine but not the machine I start the whole map reduce job?
>
> Thanks.
>
> Eason




-- 
http://www.lingcc.com

Re: MapReduce code location

Posted by Kun Ling <lk...@gmail.com>.

Hi Y. Dong,
    Here is for your questions:

1. will hadoop instantiate multiple instances of this class then transmit
them to every remote machine?
ANSWER:  Each of the  TaskTracker in Hadoop  cluster will create an
instance of your Map class, and the transmission of the data is
accomplished by other part of the framework in Hadoop cluster.

    since each TaskTracker  starts a JVM, which will create an object of
your Map class,and will feed  key-value pairs of your input data  to your
map method.  And the shuffle phase will pass the Map output data to Reduce
method.

2.  in a remote machine will the map(…) method be able to access List A and
List B locally from its own memory?

ANSWER: because each TaskTracker node have its only Map object,  they have
List A and List B in their local memory only.



Hoping the above answer helps you.


yours,

Kun Ling



On Tue, Aug 20, 2013 at 6:06 PM, Y. Dong <tq...@gmail.com> wrote:

> Hi All,
>
> I'm a Mapreduce newbie, what I want to know is that,  say I have a mapper
> class:
>
> public Class Map implements Mapper {
>
>         public List A;
>         public static List B;
>
>         public Map(){   //class constructor
>                 System.out.println("Im initializing");
>         }
>
>         @Override
>         protected void map(………){
>                 System.out.println("Im inside a mapper");
>                 …….
>         }
>
> }
>
> when I run this mapper on a multi-machine hadoop configuration, will
> hadoop instantiate
> multiple instances of this class then transmit them to every remote
> machine?  So in a remote
> machine will the map(…) method be able to access List A and List B locally
> from its own memory?
> If yes, in the map method, what if I run System.out.println, will printed
> message be only shown on
> the remote machine but not the machine I start the whole map reduce job?
>
> Thanks.
>
> Eason




-- 
http://www.lingcc.com