You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ilya Kirnos <il...@cardspring.com> on 2012/12/22 04:20:03 UTC

Is it possible to run from localized directory instead of jar?

When running hadoop locally, RunJar will unjar the job jar and use the
localized directory as the classpath to run the job.  When running
distributed, it seems the localized directory is created, but the jar is
used for the classpath instead, and the localized directory is ignored for
classpath purposes.  Is it possible to configure hadoop to use the unjared
directory instead?  (I have some relative paths that work on a real
filesystem, but not when running from a jar.)

This is the directory I'm talking about:

http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:

   - ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars
   directory, which has the job jar file and expanded jar. The job.jar is
   the application's jar file that is automatically distributed to each
   machine. It is expanded in jars directory before the tasks for the job
   start. The job.jar location is accessible to the application through the
   api JobConf.getJar()
<http://hadoop.apache.org/docs/r0.20.2/api/org/apache/hadoop/mapred/JobConf.html#getJar()>.
   To access the unjarred directory, JobConf.getJar().getParent() can be
   called.


Thanks.

-- 
-ilya

Re: Is it possible to run from localized directory instead of jar?

Posted by Harsh J <ha...@cloudera.com>.
Are you looking for the DistributedCache's archives feature? If you
add a 'archive' type to the cache, it automatically extracts it onto
the current working directory.

See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

"Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
nodes. Jars may be optionally added to the classpath of the tasks, a
rudimentary software distribution mechanism."

API call: DistributedCache.addCacheArchive(…)

On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <il...@cardspring.com> wrote:
> When running hadoop locally, RunJar will unjar the job jar and use the
> localized directory as the classpath to run the job.  When running
> distributed, it seems the localized directory is created, but the jar is
> used for the classpath instead, and the localized directory is ignored for
> classpath purposes.  Is it possible to configure hadoop to use the unjared
> directory instead?  (I have some relative paths that work on a real
> filesystem, but not when running from a jar.)
>
> This is the directory I'm talking about:
>
> http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:
>
> ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory,
> which has the job jar file and expanded jar. The job.jar is the
> application's jar file that is automatically distributed to each machine. It
> is expanded in jars directory before the tasks for the job start. The
> job.jar location is accessible to the application through the api
> JobConf.getJar() . To access the unjarred directory,
> JobConf.getJar().getParent() can be called.
>
>
> Thanks.
>
> --
> -ilya



-- 
Harsh J

Re: Is it possible to run from localized directory instead of jar?

Posted by Harsh J <ha...@cloudera.com>.
Are you looking for the DistributedCache's archives feature? If you
add a 'archive' type to the cache, it automatically extracts it onto
the current working directory.

See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

"Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
nodes. Jars may be optionally added to the classpath of the tasks, a
rudimentary software distribution mechanism."

API call: DistributedCache.addCacheArchive(…)

On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <il...@cardspring.com> wrote:
> When running hadoop locally, RunJar will unjar the job jar and use the
> localized directory as the classpath to run the job.  When running
> distributed, it seems the localized directory is created, but the jar is
> used for the classpath instead, and the localized directory is ignored for
> classpath purposes.  Is it possible to configure hadoop to use the unjared
> directory instead?  (I have some relative paths that work on a real
> filesystem, but not when running from a jar.)
>
> This is the directory I'm talking about:
>
> http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:
>
> ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory,
> which has the job jar file and expanded jar. The job.jar is the
> application's jar file that is automatically distributed to each machine. It
> is expanded in jars directory before the tasks for the job start. The
> job.jar location is accessible to the application through the api
> JobConf.getJar() . To access the unjarred directory,
> JobConf.getJar().getParent() can be called.
>
>
> Thanks.
>
> --
> -ilya



-- 
Harsh J

Re: Is it possible to run from localized directory instead of jar?

Posted by Harsh J <ha...@cloudera.com>.
Are you looking for the DistributedCache's archives feature? If you
add a 'archive' type to the cache, it automatically extracts it onto
the current working directory.

See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

"Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
nodes. Jars may be optionally added to the classpath of the tasks, a
rudimentary software distribution mechanism."

API call: DistributedCache.addCacheArchive(…)

On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <il...@cardspring.com> wrote:
> When running hadoop locally, RunJar will unjar the job jar and use the
> localized directory as the classpath to run the job.  When running
> distributed, it seems the localized directory is created, but the jar is
> used for the classpath instead, and the localized directory is ignored for
> classpath purposes.  Is it possible to configure hadoop to use the unjared
> directory instead?  (I have some relative paths that work on a real
> filesystem, but not when running from a jar.)
>
> This is the directory I'm talking about:
>
> http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:
>
> ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory,
> which has the job jar file and expanded jar. The job.jar is the
> application's jar file that is automatically distributed to each machine. It
> is expanded in jars directory before the tasks for the job start. The
> job.jar location is accessible to the application through the api
> JobConf.getJar() . To access the unjarred directory,
> JobConf.getJar().getParent() can be called.
>
>
> Thanks.
>
> --
> -ilya



-- 
Harsh J

Re: Is it possible to run from localized directory instead of jar?

Posted by Harsh J <ha...@cloudera.com>.
Are you looking for the DistributedCache's archives feature? If you
add a 'archive' type to the cache, it automatically extracts it onto
the current working directory.

See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

"Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
nodes. Jars may be optionally added to the classpath of the tasks, a
rudimentary software distribution mechanism."

API call: DistributedCache.addCacheArchive(…)

On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <il...@cardspring.com> wrote:
> When running hadoop locally, RunJar will unjar the job jar and use the
> localized directory as the classpath to run the job.  When running
> distributed, it seems the localized directory is created, but the jar is
> used for the classpath instead, and the localized directory is ignored for
> classpath purposes.  Is it possible to configure hadoop to use the unjared
> directory instead?  (I have some relative paths that work on a real
> filesystem, but not when running from a jar.)
>
> This is the directory I'm talking about:
>
> http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:
>
> ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory,
> which has the job jar file and expanded jar. The job.jar is the
> application's jar file that is automatically distributed to each machine. It
> is expanded in jars directory before the tasks for the job start. The
> job.jar location is accessible to the application through the api
> JobConf.getJar() . To access the unjarred directory,
> JobConf.getJar().getParent() can be called.
>
>
> Thanks.
>
> --
> -ilya



-- 
Harsh J