You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Andras Piros (JIRA)" <ji...@apache.org> on 2018/05/07 09:34:00 UTC

[jira] [Commented] (OOZIE-3227) Eliminate duplicated dependencies from distributed cache

    [ https://issues.apache.org/jira/browse/OOZIE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465677#comment-16465677 ] 

Andras Piros commented on OOZIE-3227:
-------------------------------------

[~dionusos] I cannot make Oozie code compile w/ Hadoop 3, and hence, cannot reproduce. If you have a patch for that, could you please update OOZIE-3219? Thanks!

> Eliminate duplicated dependencies from distributed cache
> --------------------------------------------------------
>
>                 Key: OOZIE-3227
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3227
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: core
>            Reporter: Denes Bodo
>            Assignee: Denes Bodo
>            Priority: Major
>
> Using Hadoop 3 it is not allowed to have multiple dependencies with same file names on the list of *mapreduce.job.cache.files*.
> The issue occurs when I have the same file name on multiple sharelib folders and/or my application's lib folder. This can be avoided but not easy all the time.
> I suggest to remove the duplicates from this list.
> A quick workaround for the source code in JavaActionExecutor is like:
> {code}
>             removeDuplicatedDependencies(launcherJobConf, "mapreduce.job.cache.files");
>             removeDuplicatedDependencies(launcherJobConf, "mapreduce.job.cache.archives");
> ......
> private void removeDuplicatedDependencies(JobConf conf, String key) {
>         final Map<String, String> nameToPath = new HashMap<>();
>         StringBuilder uniqList = new StringBuilder();
>         for(String dependency: conf.get(key).split(",")) {
>             final String[] arr = dependency.split("/");
>             final String dependencyName = arr[arr.length - 1];
>             if(nameToPath.containsKey(dependencyName)) {
>                 LOG.warn(dependencyName + " [" + dependency + "] is already defined in " + key + ". Skipping...");
>             } else {
>                 nameToPath.put(dependencyName, dependency);
>                 uniqList.append(dependency).append(",");
>             }
>         }
>         uniqList.setLength(uniqList.length() - 1);
>         conf.set(key, uniqList.toString());
>     }
> {code}
> Other way is to eliminate the deprecated *org.apache.hadoop.filecache.DistributedCache*.
> I am going to have a deeper understanding how we should use distributed cache and all the comments are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)