You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Purshotam Shah (JIRA)" <ji...@apache.org> on 2015/10/27 22:09:27 UTC

[jira] [Commented] (OOZIE-2347) Remove unnecessary new Configuration()/new jobConf() calls from oozie

    [ https://issues.apache.org/jira/browse/OOZIE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977171#comment-14977171 ] 

Purshotam Shah commented on OOZIE-2347:
---------------------------------------

We noticed issue with this patch in our production.
If happens when {{oozie.service.HadoopAccessorService.action.configurations.load.default.resources}} is set to true ( default is false), Some file are added to distributed cache with doesn't contain default confs.

This happens only for pig and hive jobs as they override to create jobconf.

At runtime we see  ErrorCode [JA009], Message [JA009: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name].
Doing a quick fix to resolve issue. We noticed that JavaActionExecutor sharelib code is convoluted and does lot of repetition of code and some unnecessary stuff which is expensive, that needs to rewritten.
Will create a new JIRA for that.


> Remove unnecessary new Configuration()/new jobConf() calls from oozie
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-2347
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2347
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>            Assignee: Purshotam Shah
>             Fix For: trunk
>
>         Attachments: OOZIE-2347-V1.patch, OOZIE-2347-V2.patch, amend-OOZIE-2347-V1.patch
>
>
> We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
> <0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking
> There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.
> 1.
> Configuration defaultConf = new Configuration(); is called for every file we add to classpath.
> {code}
> public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException {
>       Configuration defaultConf = new Configuration();
>       XConfiguration.copy(conf, defaultConf);
>       if (fs == null) {
>         // it fails with conf, therefore we pass defaultConf instead
>         fs = file.getFileSystem(defaultConf);
>       }
>       // Hadoop 0.20/1.x.
>       if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) {
>           // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20
>           // Refer OOZIE-1806.
>           String filepath = file.toUri().getPath();
>           String classpath = conf.get("mapred.job.classpath.files");
>           conf.set("mapred.job.classpath.files", classpath == null
>               ? filepath
>               : classpath + System.getProperty("path.separator") + filepath);
>           URI uri = fs.makeQualified(file).toUri();
>           DistributedCache.addCacheFile(uri, conf);
>       }
>       else { // Hadoop 0.23/2.x
>           DistributedCache.addFileToClassPath(file, conf, fs);
>       }
>     }
> {code}
> 2.
> sharelib setup also calls new Configuration(), which is not needed.
> {code}
> public Configuration getShareLibConf(String inputKey, Path path) {
>         Configuration conf = new Configuration();
>         if (shareLibConfigMap.containsKey(inputKey)) {
>             conf = shareLibConfigMap.get(inputKey).get(path);
>         }
>         return conf;
>     }
> {code}	
> 	
> 	
> 3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)