You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Purshotam Shah (JIRA)" <ji...@apache.org> on 2015/08/31 19:27:45 UTC

[jira] [Updated] (OOZIE-2347) Remove unnecessary new Configuration()/new jobConf() calls from oozie

     [ https://issues.apache.org/jira/browse/OOZIE-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Purshotam Shah updated OOZIE-2347:
----------------------------------
    Description: 
We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
<0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking

There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.

1.
Configuration defaultConf = new Configuration(); is called for every file we add to classpath.

{code}
public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException {
      Configuration defaultConf = new Configuration();
      XConfiguration.copy(conf, defaultConf);
      if (fs == null) {
        // it fails with conf, therefore we pass defaultConf instead
        fs = file.getFileSystem(defaultConf);
      }
      // Hadoop 0.20/1.x.
      if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) {
          // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20
          // Refer OOZIE-1806.
          String filepath = file.toUri().getPath();
          String classpath = conf.get("mapred.job.classpath.files");
          conf.set("mapred.job.classpath.files", classpath == null
              ? filepath
              : classpath + System.getProperty("path.separator") + filepath);
          URI uri = fs.makeQualified(file).toUri();
          DistributedCache.addCacheFile(uri, conf);
      }
      else { // Hadoop 0.23/2.x
          DistributedCache.addFileToClassPath(file, conf, fs);
      }
    }
{code}

2.
sharelib setup also calls new Configuration(), which is not needed.
{code}
public Configuration getShareLibConf(String inputKey, Path path) {
        Configuration conf = new Configuration();
        if (shareLibConfigMap.containsKey(inputKey)) {
            conf = shareLibConfigMap.get(inputKey).get(path);
        }

        return conf;
    }
{code}	
	
	
3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.


> Remove unnecessary new Configuration()/new jobConf() calls from oozie
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-2347
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2347
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>
> We noticed that setting of job sharelib was slow and one prime reason was lot of thread was blocked on "java.util.zip.ZipFile.getEntry"
> <0x00000005c0afda68> (a java.util.jar.JarFile): 0 Thread(s) sleeping, 178 Thread(s) waiting, 1 Thread(s) locking
> There are lot of places we do new Configuration()/new jobConf() unnecessarily. This can be easily removed to enhance performance.
> 1.
> Configuration defaultConf = new Configuration(); is called for every file we add to classpath.
> {code}
> public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException {
>       Configuration defaultConf = new Configuration();
>       XConfiguration.copy(conf, defaultConf);
>       if (fs == null) {
>         // it fails with conf, therefore we pass defaultConf instead
>         fs = file.getFileSystem(defaultConf);
>       }
>       // Hadoop 0.20/1.x.
>       if (defaultConf.get("yarn.resourcemanager.webapp.address") == null) {
>           // Duplicate hadoop 1.x code to workaround MAPREDUCE-2361 in Hadoop 0.20
>           // Refer OOZIE-1806.
>           String filepath = file.toUri().getPath();
>           String classpath = conf.get("mapred.job.classpath.files");
>           conf.set("mapred.job.classpath.files", classpath == null
>               ? filepath
>               : classpath + System.getProperty("path.separator") + filepath);
>           URI uri = fs.makeQualified(file).toUri();
>           DistributedCache.addCacheFile(uri, conf);
>       }
>       else { // Hadoop 0.23/2.x
>           DistributedCache.addFileToClassPath(file, conf, fs);
>       }
>     }
> {code}
> 2.
> sharelib setup also calls new Configuration(), which is not needed.
> {code}
> public Configuration getShareLibConf(String inputKey, Path path) {
>         Configuration conf = new Configuration();
>         if (shareLibConfigMap.containsKey(inputKey)) {
>             conf = shareLibConfigMap.get(inputKey).get(path);
>         }
>         return conf;
>     }
> {code}	
> 	
> 	
> 3.CoordActionInputCheckXCommand.checkPath also creates jobConf every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)