You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Subramaniam Krishnan (JIRA)" <ji...@apache.org> on 2008/04/22 14:48:38 UTC

[jira] Created: (HADOOP-3298) Add support for transitive native libraries to DistributedCache

Add support for transitive native libraries to DistributedCache 
----------------------------------------------------------------

                 Key: HADOOP-3298
                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
         Environment: unix (different handling would be required for windows)
            Reporter: Subramaniam Krishnan
            Assignee: Arun C Murthy
             Fix For: 0.16.0


Currently if a M/R job depends on JNI based component the dynamic library must be available in all the task nodes. This is not possible specially when you have not control on the cluster machines, just using it as a service.

It should be possible to specify using the DistributedCache what are the native libraries a job needs.

For example via a new method 'public void addLibrary(Path libraryPath, JobConf conf)'.

The added libraries would make it to the local FS of the task nodes (same way as cached resources) but instead been part of the classpath they would be copied to a lib directory and that lib directory would be added t the LD_LIBRARY_PATH of the task JVM.

An alternative would be to set the '-Djava.library.path=' task JVM parameter to the lib directory above. However, this would break for libraries that depend on other libraries as the dependent one would not be in the LD_LIBRARY_PATH and the OS would fail to find it as it is not the JVM the one doing the load of the dependent one.

For uncached usage of native libraries, a special directory in the JAR could be used for native libraries. But I'd argue that the DistributedCache enhancement would be enough, and if somebody wants to use a native library s/he should use the DistributedCached. Or a JobConf addLibrary method that uses the DistributedCached under the hood at submission time.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-3298) Add support for transitive native libraries to DistributedCache

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-3298.
---------------------------------

    Resolution: Fixed

This is a duplicate of HADOOP-2867

> Add support for transitive native libraries to DistributedCache 
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: unix (different handling would be required for windows)
>            Reporter: Subramaniam Krishnan
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> Currently if a M/R job depends on JNI based component the dynamic library must be available in all the task nodes. This is not possible specially when you have not control on the cluster machines, just using it as a service.
> It should be possible to specify using the DistributedCache what are the native libraries a job needs.
> For example via a new method 'public void addLibrary(Path libraryPath, JobConf conf)'.
> The added libraries would make it to the local FS of the task nodes (same way as cached resources) but instead been part of the classpath they would be copied to a lib directory and that lib directory would be added t the LD_LIBRARY_PATH of the task JVM.
> An alternative would be to set the '-Djava.library.path=' task JVM parameter to the lib directory above. However, this would break for libraries that depend on other libraries as the dependent one would not be in the LD_LIBRARY_PATH and the OS would fail to find it as it is not the JVM the one doing the load of the dependent one.
> For uncached usage of native libraries, a special directory in the JAR could be used for native libraries. But I'd argue that the DistributedCache enhancement would be enough, and if somebody wants to use a native library s/he should use the DistributedCached. Or a JobConf addLibrary method that uses the DistributedCached under the hood at submission time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3298) Add support for transitive native library dependencies in DistributedCache

Posted by "Subramaniam Krishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Subramaniam Krishnan updated HADOOP-3298:
-----------------------------------------

      Description: 

[Hadoop 1660] add the tasks's working directory to it's JAVA_LIBRARY_PATH, this allows to distribute JNI based native libs with the DistributedCache by symbolically linking them to the tasks's working directory.

There are many instances when the JNI components have transitive dependencies to other native libraries. Even if we have symlinks for these in  tasks's working directory, they'll not resolved as transitive dependencies of native libraries are looked up in the shell's LD_LIBRARY_PATH and not in the JAVA_LIBRARY_PATH. This can be tackled by adding CWD to JAVA_LIBRARY_PATH in the *hadoop-env.sh*.
A better way to do this would be to add the CWD only in environment of the Task Tracker as only it needs it.


  was:
Currently if a M/R job depends on JNI based component the dynamic library must be available in all the task nodes. This is not possible specially when you have not control on the cluster machines, just using it as a service.

It should be possible to specify using the DistributedCache what are the native libraries a job needs.

For example via a new method 'public void addLibrary(Path libraryPath, JobConf conf)'.

The added libraries would make it to the local FS of the task nodes (same way as cached resources) but instead been part of the classpath they would be copied to a lib directory and that lib directory would be added t the LD_LIBRARY_PATH of the task JVM.

An alternative would be to set the '-Djava.library.path=' task JVM parameter to the lib directory above. However, this would break for libraries that depend on other libraries as the dependent one would not be in the LD_LIBRARY_PATH and the OS would fail to find it as it is not the JVM the one doing the load of the dependent one.

For uncached usage of native libraries, a special directory in the JAR could be used for native libraries. But I'd argue that the DistributedCache enhancement would be enough, and if somebody wants to use a native library s/he should use the DistributedCached. Or a JobConf addLibrary method that uses the DistributedCached under the hood at submission time.



    Fix Version/s:     (was: 0.16.0)
                   0.18.0
         Assignee:     (was: Arun C Murthy)
          Summary: Add support for transitive native library dependencies in DistributedCache   (was: Add support for transitive native libraries to DistributedCache )

> Add support for transitive native library dependencies in DistributedCache 
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-3298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: unix (different handling would be required for windows)
>            Reporter: Subramaniam Krishnan
>             Fix For: 0.18.0
>
>
> [Hadoop 1660] add the tasks's working directory to it's JAVA_LIBRARY_PATH, this allows to distribute JNI based native libs with the DistributedCache by symbolically linking them to the tasks's working directory.
> There are many instances when the JNI components have transitive dependencies to other native libraries. Even if we have symlinks for these in  tasks's working directory, they'll not resolved as transitive dependencies of native libraries are looked up in the shell's LD_LIBRARY_PATH and not in the JAVA_LIBRARY_PATH. This can be tackled by adding CWD to JAVA_LIBRARY_PATH in the *hadoop-env.sh*.
> A better way to do this would be to add the CWD only in environment of the Task Tracker as only it needs it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-3298) Add support for transitive native libraries to DistributedCache

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reopened HADOOP-3298:
---------------------------------


> Add support for transitive native libraries to DistributedCache 
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: unix (different handling would be required for windows)
>            Reporter: Subramaniam Krishnan
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>
> Currently if a M/R job depends on JNI based component the dynamic library must be available in all the task nodes. This is not possible specially when you have not control on the cluster machines, just using it as a service.
> It should be possible to specify using the DistributedCache what are the native libraries a job needs.
> For example via a new method 'public void addLibrary(Path libraryPath, JobConf conf)'.
> The added libraries would make it to the local FS of the task nodes (same way as cached resources) but instead been part of the classpath they would be copied to a lib directory and that lib directory would be added t the LD_LIBRARY_PATH of the task JVM.
> An alternative would be to set the '-Djava.library.path=' task JVM parameter to the lib directory above. However, this would break for libraries that depend on other libraries as the dependent one would not be in the LD_LIBRARY_PATH and the OS would fail to find it as it is not the JVM the one doing the load of the dependent one.
> For uncached usage of native libraries, a special directory in the JAR could be used for native libraries. But I'd argue that the DistributedCache enhancement would be enough, and if somebody wants to use a native library s/he should use the DistributedCached. Or a JobConf addLibrary method that uses the DistributedCached under the hood at submission time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-3298) Add support for transitive native libraries to DistributedCache

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-3298.
---------------------------------

    Resolution: Duplicate

> Add support for transitive native libraries to DistributedCache 
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: unix (different handling would be required for windows)
>            Reporter: Subramaniam Krishnan
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> Currently if a M/R job depends on JNI based component the dynamic library must be available in all the task nodes. This is not possible specially when you have not control on the cluster machines, just using it as a service.
> It should be possible to specify using the DistributedCache what are the native libraries a job needs.
> For example via a new method 'public void addLibrary(Path libraryPath, JobConf conf)'.
> The added libraries would make it to the local FS of the task nodes (same way as cached resources) but instead been part of the classpath they would be copied to a lib directory and that lib directory would be added t the LD_LIBRARY_PATH of the task JVM.
> An alternative would be to set the '-Djava.library.path=' task JVM parameter to the lib directory above. However, this would break for libraries that depend on other libraries as the dependent one would not be in the LD_LIBRARY_PATH and the OS would fail to find it as it is not the JVM the one doing the load of the dependent one.
> For uncached usage of native libraries, a special directory in the JAR could be used for native libraries. But I'd argue that the DistributedCache enhancement would be enough, and if somebody wants to use a native library s/he should use the DistributedCached. Or a JobConf addLibrary method that uses the DistributedCached under the hood at submission time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3298) Add support for transitive native library dependencies in DistributedCache

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-3298:
--------------------------------

      Description: 
[Hadoop 1660] add the tasks's working directory to it's JAVA_LIBRARY_PATH, this allows to distribute JNI based native libs with the DistributedCache by symbolically linking them to the tasks's working directory.

There are many instances when the JNI components have transitive dependencies to other native libraries. Even if we have symlinks for these in  tasks's working directory, they'll not resolved as transitive dependencies of native libraries are looked up in the shell's LD_LIBRARY_PATH and not in the JAVA_LIBRARY_PATH. This can be tackled by adding CWD to JAVA_LIBRARY_PATH in the *hadoop-env.sh*.
A better way to do this would be to add the CWD only in environment of the Task Tracker as only it needs it.


  was:

[Hadoop 1660] add the tasks's working directory to it's JAVA_LIBRARY_PATH, this allows to distribute JNI based native libs with the DistributedCache by symbolically linking them to the tasks's working directory.

There are many instances when the JNI components have transitive dependencies to other native libraries. Even if we have symlinks for these in  tasks's working directory, they'll not resolved as transitive dependencies of native libraries are looked up in the shell's LD_LIBRARY_PATH and not in the JAVA_LIBRARY_PATH. This can be tackled by adding CWD to JAVA_LIBRARY_PATH in the *hadoop-env.sh*.
A better way to do this would be to add the CWD only in environment of the Task Tracker as only it needs it.


    Fix Version/s:     (was: 0.18.0)

> Add support for transitive native library dependencies in DistributedCache 
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-3298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3298
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: unix (different handling would be required for windows)
>            Reporter: Subramaniam Krishnan
>
> [Hadoop 1660] add the tasks's working directory to it's JAVA_LIBRARY_PATH, this allows to distribute JNI based native libs with the DistributedCache by symbolically linking them to the tasks's working directory.
> There are many instances when the JNI components have transitive dependencies to other native libraries. Even if we have symlinks for these in  tasks's working directory, they'll not resolved as transitive dependencies of native libraries are looked up in the shell's LD_LIBRARY_PATH and not in the JAVA_LIBRARY_PATH. This can be tackled by adding CWD to JAVA_LIBRARY_PATH in the *hadoop-env.sh*.
> A better way to do this would be to add the CWD only in environment of the Task Tracker as only it needs it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.