You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Martin Eckert (JIRA)" <ji...@apache.org> on 2008/10/24 04:27:44 UTC

[jira] Created: (HADOOP-4511) Support for Manifest file inside Distributed Caches (Archives)

Support for Manifest file inside Distributed Caches (Archives)
--------------------------------------------------------------

                 Key: HADOOP-4511
                 URL: https://issues.apache.org/jira/browse/HADOOP-4511
             Project: Hadoop Core
          Issue Type: Improvement
    Affects Versions: 0.17.2
            Reporter: Martin Eckert
            Priority: Minor


I'm in a situation where I'm using the DistributedCache API to add a library package to my hadoop job. The library bundle consists of a JAR file, native library files and data files. At this point it is pretty cumbersome to set up the job properly so that the library can be used from within the map/reduce job.

The best way I could come up with was to keep the <lib>.jar file outside of the archive file and use the -libjars argument to point to the external JAR file. The archive is submitted using DistributedCache.setCacheArchives() and DistributedCache.createSymlink().
To add the library path (with the native library files), I append -Djava.library.path=./symlink/lib to the mapred.child.java.opts JobConf option. To reference the config file inside the archive the relative path (e.g. ./symlink/conf/config.txt) is used.

It would be very helpful if these settings could largely be encapsulated inside the archive itself in form of a Manifest file. The manifest file could define the relative path to the jar file(s) and library path(s). Those would be automatically read and added to the jobs class and library paths.

The config file could be referenced and assigned a name inside the manifest so that in the code those would be available through the JobConf.get() method and used where needed.

There would be other opportunities that would come from this approach but mainly it would make deployment and distribution of archived packages for Hadoop much easier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.