You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "Datta, Saurav" <sd...@paypal.com> on 2015/10/13 08:14:27 UTC

Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 1:42 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 1:42 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 1:42 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 1:42 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Yes, the grep example job from the Hadoop codebase is a good demo.

Here we can see the grep job setting up Configuration with the regex to match:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/Grep.java

The RegexMapper class then consumes this by overriding setup to read the regex back out of Configuration:

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/RegexMapper.java

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Tuesday, October 13, 2015 at 1:04 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Passing instance of a class to Mapper

Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by "Datta, Saurav" <sd...@paypal.com>.
Thanks very much Chris!
I will try it out. Do you have any examples showing this  ?

--


From: Chris Nauroth
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Date: Tuesday, October 13, 2015 at 12:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>"
Subject: Re: Passing instance of a class to Mapper

Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722

Re: Passing instance of a class to Mapper

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saurav,

You are correct that it generally is not possible to pass an instance of a class directly to a mapper (or reducer).  This is because the mapper tasks execute on arbitrary nodes in the Hadoop cluster, running in different JVM processes from the JVM running the client that submits the job.

A typical solution is for the client to populate the Configuration object with relevant primitive data type values.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html

This configuration propagates to all map and reduce tasks of the job.  The Mapper can override the setup function to do one-time initialization at the start of the task.

http://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapreduce/Mapper.html#setup(org.apache.hadoop.mapreduce.Mapper.Context)

As part of this one-time initialization, you can read the values back out of the Configuration.  As I said earlier, these will be only primitive types like String or int.  If it's helpful, your setup method can use the primitive values read from configuration to reconstruct an instance of any class that you want.

I hope this helps.

--Chris Nauroth

From: <Datta>, Saurav <sd...@paypal.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Monday, October 12, 2015 at 11:14 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing instance of a class to Mapper

Hello,

I am trying to pass an instance of a class to a Mapper. However, I understand Hadoop does not allow this.
Any workaround to make this happen ?

Regards,
Saurav Datta

Data Engineer| Desk - (408)967-7360| Cell - (408)666-1722