You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Ahad Rana (JIRA)" <ji...@apache.org> on 2008/11/03 19:42:44 UTC

[jira] Created: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file
-----------------------------------------------------------------------------------------------------------------

Key: HADOOP-4577
URL: https://issues.apache.org/jira/browse/HADOOP-4577
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Affects Versions: 0.18.1
Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
Reporter: Ahad Rana
Assignee: Ahad Rana
Fix For: 0.18.3

It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645223#action_12645223 ] 

Edward J. Yoon commented on HADOOP-4577:
----------------------------------------

+1 Simply great patch. And, I also think environment variables must be configured.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4577:
------------------------------

    Attachment: HADOOP-4577-v1.patch

Adds job jar's lib directory to library.path setting. 

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644914#action_12644914 ] 

Hadoop QA commented on HADOOP-4577:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393259/HADOOP-4577-v1.patch
  against trunk revision 709609.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3523/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3523/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3523/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3523/console

This message is automatically generated.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4577:
------------------------------

    Status: Open  (was: Patch Available)

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645459#action_12645459 ] 

Steve Loughran commented on HADOOP-4577:
----------------------------------------

Ahad, I don't plan on having a heterogenous cluster, but I can imagine creating JARs that don't care which x86 system they run on. You could have a build that had both 32 and 64 bit images in there (possibly kept under SCM to avoid recompile times) and they just get JARred and uploaded. 





> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4577:
------------------------------

    Status: Patch Available  (was: Open)

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johan Oskarsson updated HADOOP-4577:
------------------------------------

    Fix Version/s:     (was: 0.18.3)
                   0.20.0

Moving this to 0.20.0 since it's a new feature and not a blocking bug.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645217#action_12645217 ] 

Steve Loughran commented on HADOOP-4577:
----------------------------------------

1. It would be good for the env variable to get set up too, so that you can do transient libraries. Otherwise a lot of the benefits get lost.

2. It would also be good if the lib/ dir could have subdirs for OS and arch, so that I could have jar with lib/linux/x8632/something.so, linux/lib/amd64/something.so; the runtime would select the right shared lib for the platform. This would also make the feature something that could be tested with a JAR containing a set of platform's shared libraries

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4577:
------------------------------

    Status: Patch Available  (was: Open)

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654877#action_12654877 ] 

Ahad Rana commented on HADOOP-4577:
-----------------------------------

Hi Arun,

Sorry, I have been tied up with other stuff and have not been able to bring this issue to closure. Will try to submit a patch based on previous suggestions shortly. 

As far as Distributed Cache vs. distributing via the Jar: In my experience, both methods have their valid uses. In our use case at CommonCrawl, we deploy a single jar that contains all of our map-reduce jobs to a master server, which then executes specific jobs on demand. We have various utility classes that are wrappers around native C/C++ libraries. We build these JNI wrappers and JNI libraries via the same build script that builds the jar. It is super convenient to be able to include and deploy the related JNI libraries within the jar (and thus have them available at each mapper/reducer node). This way, all of our various jobs can use these classes seamlessly without relying on any special JOB SPECIFIC setup (such as adding the appropriate JNI libraries to the Distributed Cache). 

So, in conclusion, Distributed Cache is good for cases where library availability is determined by job config, mapred.child.java.opts is convenient for scenarios where a set of (relatively static) libraries are part of the standard cluster config, and the third method I am proposing, deployment via jar, is convenient for scenarios where a deployment jar contains more than one job, and library availability is desired across all jobs. Sound right ? 


Ahad.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654866#action_12654866 ] 

Arun C Murthy commented on HADOOP-4577:
---------------------------------------

Sorry for being late on this one Ahad... the patch looks fine. However, rather than distributing libraries via job.jar it would be more efficient to distribute them via the DistributedCache. This paradigm is already supported by adding the task's cwd to it's LD_LIBRARY_PATH. Given that, do you still need this feature?

Oh, and you can specify more directories to be added to the LD_LIBRARY_PATH via mapred.child.java.opts - just use -Djava.library.path=<path> in your JobConf.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4577:
----------------------------------

    Status: Open  (was: Patch Available)

Fair enough. Like I said, the patch looks good - do you mind fixing the documentation in http://hadoop.apache.org/core/docs/current/mapred_tutorial.html and the comment in TaskRunner.java too?

{noformat}
      // Add java.library.path; necessary for loading native libraries.
      //
      // 1. To support native-hadoop library i.e. libhadoop.so, we add the 
      //    parent processes' java.library.path to the child. 
      // 2. We also add the 'cwd' of the task to it's java.library.path to help 
      //    users distribute native libraries via the DistributedCache.
      // 3. The user can also specify extra paths to be added to the 
      //    java.library.path via mapred.child.java.opts.
      //
{noformat}

If you are tied up currently to fix the mapred_tutorial.html, please go ahead and file a new jira to fix the documentation alone... thanks!

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4577) Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645408#action_12645408 ] 

Ahad Rana commented on HADOOP-4577:
-----------------------------------

Hi Steve / Edward, 

I will look into the environment variable issue, although I deliberately left it out because the name of the dynamic loader search path variable differs based on operating system, and it thus potentially injects operating system awareness into the code (vs. at the script level, where it is potentially more acceptable). In our usage scenario, we assume that any dependencies that the deployed JNI lib may have are usually more stable in nature and thus can be pre-installed/pre-deployed on the cluster into a directory already in the LD_LIBRARY_PATH, such as /usr/local/lib for example.

Adding the platform identifier to the lib path is a valid suggestion. I am curious, do you envision having a cluster deployment with a potential mixed set of operating system configurations ? In this scenario, you would definitely need a JAR with multiple operating system specific versions of the JNI libraries. In our deployment example, we always build our jar on the cluster (since we are deployed in a data center and transferring source code is way faster than transferring jar files across DSL lines), and thus our build script properly identifies the host system and builds an appropriate JNI library for the platform. But, if you feel the alternative of properly qualifying the JNI access path by OS type is important, I will look into using PlatformName utility under hadoop util to produce an appropriate path name.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.18.3
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a custom map-reduce job via the job's JAR file. The TaskRunner already establishes a precedent by automatically adding any jar files contained in the "lib" directory of the job jar to the child map/reduce process's classpath. Following this convention, it should also be possible to deploy custom JNI libraries in the same lib directory. This involves adding the path to the job jar's lib directory to the VM's library.path setting (after the jar has been expanded in the job cache directory). This does not elimintate the need add dependent shared libraries that may be referenced by the JNI libraries to the system's LD_LIBRARY_PATH variable. In our deployment configuration, we usually pre-install third party shared libraries across the cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.