You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Segel <mi...@hotmail.com> on 2012/08/01 01:26:49 UTC

Re: task jvm bootstrapping via distributed cache

Hi Stan,

If I understood your question... you want to ship a jar to the nodes where the task will run prior to the start of the task? 

Not sure what it is you're trying to do...
Your example isn't  really clear. 

See: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html

When you pull stuff out of the cache you get the path to the jar. 
Or you should be able to get it. 

I'm assuming you're doing this in your setup() method? 

Can you give a better example, there may be a different way to handle this...

On Jul 31, 2012, at 3:50 PM, Stan Rosenberg <st...@gmail.com> wrote:

> Forwarding to common-user to hopefully get more exposure...
> 
> 
> ---------- Forwarded message ----------
> From: Stan Rosenberg <st...@gmail.com>
> Date: Tue, Jul 31, 2012 at 11:55 AM
> Subject: Re: task jvm bootstrapping via distributed cache
> To: mapreduce-user@hadoop.apache.org
> 
> 
> I am guessing this is either a well-known problem or an edge case.  In
> any case, would it be a bad idea to designate predetermined output
> paths?
> E.g., DistributedCache.addCacheFileInto(uri, conf, outputPath) would
> attempt to copy the cached file into the specified path resolving to a
> task's local filesystem.
> 
> Thanks,
> 
> stan
> 
> On Mon, Jul 30, 2012 at 6:23 PM, Stan Rosenberg
> <st...@gmail.com> wrote:
>> Hi,
>> 
>> I am seeking a way to leverage hadoop's distributed cache in order to
>> ship jars that are required to bootstrap a task's jvm, i.e., before a
>> map/reduce task is launched.
>> As a concrete example, let's say that I need to launch with
>> '-javaagent:/path/profiler.jar'.  In theory, the task tracker is
>> responsible for downloading cached files onto its local filesystem.
>> However, the absolute path to a given cached file is not known a
>> priori; however, we need the path in order to configure '-javaagent'.
>> 
>> Is this currently possible with the distributed cache? If not, is the
>> use case appealing enough to open a jira ticket?
>> 
>> Thanks,
>> 
>> stan
> 


Re: task jvm bootstrapping via distributed cache

Posted by Stan Rosenberg <st...@gmail.com>.
On Tue, Jul 31, 2012 at 7:26 PM, Michael Segel
<mi...@hotmail.com> wrote:
> Hi Stan,
>
> If I understood your question... you want to ship a jar to the nodes where the task will run prior to the start of the task?
>
> Not sure what it is you're trying to do...
> Your example isn't  really clear.

Correct.  I want to ship a jar to the task, but I need to know its
absolute path before the task jvm is launched.
As an example, -javaagent JVM option expects a jar path.

>
> See: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html
>
> When you pull stuff out of the cache you get the path to the jar.
> Or you should be able to get it.
>

It would be too late at that point; the task tracker controls the
launching of the JVM.  The path of the shipped jar need to be
available before the task is launched.

> Can you give a better example, there may be a different way to handle this...
>
Does the example above make sense?