You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tarandeep Singh <ta...@gmail.com> on 2009/06/03 19:59:15 UTC

Sharing object between mappers on same node (reuse.jvm ?)

Hi,

I want to share a object (Lucene Index Writer Instance) between mappers
running on same node of 1 job (not across multiple jobs). Please correct me
if I am wrong -

If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
mappers of one job will be executed in the same jvm and in that case if I
create a static Lucene Index Writer instance in my mapper class, all mappers
running on the same node will be able to use it.

Thanks,
Tarandeep

Re: Sharing object between mappers on same node (reuse.jvm ?)

Posted by Tarandeep Singh <ta...@gmail.com>.
Thanks Kevin for the clarification. I ran couple of tests as well and the
system behaved exactly what you had said.

So now the question is, how can I achieve what I want to do - share an
object (Lucene IndexWriter instance) between mappers running on same node. I
thought of running the IndexWriter separately outside of Hadoop and use
RMI/socket etc to communicate with it, but I am being optimistic that there
should be a simpler way than this. Any thoughts ?

Also, what if I modify the default behaviour of Hadoop to run mappers on a
node in one JVM ? (not sure if that will be possible in one first place,
just a thought)

-Tarandeep

On Thu, Jun 4, 2009 at 12:49 AM, Kevin Peterson <kp...@biz360.com>wrote:

> On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh <tarandeep@gmail.com
> >wrote:
>
> > I want to share a object (Lucene Index Writer Instance) between mappers
> > running on same node of 1 job (not across multiple jobs). Please correct
> me
> > if I am wrong -
> >
> > If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
> > mappers of one job will be executed in the same jvm and in that case if I
> > create a static Lucene Index Writer instance in my mapper class, all
> > mappers
> > running on the same node will be able to use it.
> >
>
> Not quite. The JVM reuse controls whether the JVM will be terminated after
> a
> single mapper run and a new one created for the next. It doesn't influence
> how many JVMs are created -- you will still get one jvm per mapper or
> reducer.
>
> I think there is, or was, or maybe a patch enables, what you are asking
> for,
> IIRC.
>

Re: Sharing object between mappers on same node (reuse.jvm ?)

Posted by Kevin Peterson <kp...@biz360.com>.
On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh <ta...@gmail.com>wrote:

> I want to share a object (Lucene Index Writer Instance) between mappers
> running on same node of 1 job (not across multiple jobs). Please correct me
> if I am wrong -
>
> If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
> mappers of one job will be executed in the same jvm and in that case if I
> create a static Lucene Index Writer instance in my mapper class, all
> mappers
> running on the same node will be able to use it.
>

Not quite. The JVM reuse controls whether the JVM will be terminated after a
single mapper run and a new one created for the next. It doesn't influence
how many JVMs are created -- you will still get one jvm per mapper or
reducer.

I think there is, or was, or maybe a patch enables, what you are asking for,
IIRC.