You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Amandeep Khurana <am...@gmail.com> on 2009/05/03 07:59:40 UTC

Global object for a map task

How can I create a global variable for each node running my map task. For
example, a common ArrayList that my map function can access for every k,v
pair it works on. It doesnt really need to create the ArrayList everytime.

If I create it in the main function of the job, the map function gets a null
pointer exception. Where else can this be created?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: Global object for a map task

Posted by jason hadoop <ja...@gmail.com>.
define some key of yours, say "my.app.array",
serialize the object to a string, say myObjectString

Then conf.set("my.app.array", myObjectString)

Then in your Mapper.configure() method
String myObjectString = conf.get("my.app.array")
deserialize your object.

A google search on java object serialization to string, will provide you
with many examples, such as
http://www.velocityreviews.com/forums/showpost.php?p=3185744&postcount=10.



On Sat, May 2, 2009 at 11:56 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Thanks Jason.
>
> My object is relatively small. But how do I pass it via the JobConf object?
> Can you elaborate a bit...
>
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Sat, May 2, 2009 at 11:53 PM, jason hadoop <jason.hadoop@gmail.com
> >wrote:
>
> > If it is relatively small you can pass it via the JobConf object, storing
> a
> > serialized version of your dataset.
> > If it is larger you can pass a serialized version via the distributed
> > cache.
> > Your map task will need to deserialize the object in the configure
> method.
> >
> > None of the above methods give you an object that is write shared between
> > map tasks.
> >
> > Please remember that the map tasks execute in separate JVM's on distinct
> > machines in the normal MapReduce environment.
> >
> >
> >
> > On Sat, May 2, 2009 at 10:59 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> >
> > > How can I create a global variable for each node running my map task.
> For
> > > example, a common ArrayList that my map function can access for every
> k,v
> > > pair it works on. It doesnt really need to create the ArrayList
> > everytime.
> > >
> > > If I create it in the main function of the job, the map function gets a
> > > null
> > > pointer exception. Where else can this be created?
> > >
> > > Amandeep
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Global object for a map task

Posted by Amandeep Khurana <am...@gmail.com>.
Thanks Jason.

My object is relatively small. But how do I pass it via the JobConf object?
Can you elaborate a bit...



Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sat, May 2, 2009 at 11:53 PM, jason hadoop <ja...@gmail.com>wrote:

> If it is relatively small you can pass it via the JobConf object, storing a
> serialized version of your dataset.
> If it is larger you can pass a serialized version via the distributed
> cache.
> Your map task will need to deserialize the object in the configure method.
>
> None of the above methods give you an object that is write shared between
> map tasks.
>
> Please remember that the map tasks execute in separate JVM's on distinct
> machines in the normal MapReduce environment.
>
>
>
> On Sat, May 2, 2009 at 10:59 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > How can I create a global variable for each node running my map task. For
> > example, a common ArrayList that my map function can access for every k,v
> > pair it works on. It doesnt really need to create the ArrayList
> everytime.
> >
> > If I create it in the main function of the job, the map function gets a
> > null
> > pointer exception. Where else can this be created?
> >
> > Amandeep
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Global object for a map task

Posted by jason hadoop <ja...@gmail.com>.
If it is relatively small you can pass it via the JobConf object, storing a
serialized version of your dataset.
If it is larger you can pass a serialized version via the distributed cache.
Your map task will need to deserialize the object in the configure method.

None of the above methods give you an object that is write shared between
map tasks.

Please remember that the map tasks execute in separate JVM's on distinct
machines in the normal MapReduce environment.



On Sat, May 2, 2009 at 10:59 PM, Amandeep Khurana <am...@gmail.com> wrote:

> How can I create a global variable for each node running my map task. For
> example, a common ArrayList that my map function can access for every k,v
> pair it works on. It doesnt really need to create the ArrayList everytime.
>
> If I create it in the main function of the job, the map function gets a
> null
> pointer exception. Where else can this be created?
>
> Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422