You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Sergey Zhemzhitsky (JIRA)" <ji...@apache.org> on 2017/09/07 12:36:00 UTC

[jira] [Comment Edited] (OOZIE-2547) Add mapreduce.job.cache.files to spark action

    [ https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156886#comment-16156886 ] 

Sergey Zhemzhitsky edited comment on OOZIE-2547 at 9/7/17 12:35 PM:
--------------------------------------------------------------------

Hello [~rkanter], [~rohini], [~satishsaley] 

I've noticed that the patch from this issue removes **determineSparkJarsAndClasspath** method introduced in OOZIE-2277 by [~rkanter]. 

Currently we are migrating our jobs from [CDH 5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html] without this patch to CDH 5.12 that has this patch applied starting from [CDH 5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html] and it seems that there is a regression, because all of our jobs which use hdfs api internally started to fail with the following error in the oozie launcher logs
{code}
Log Type: stderr
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 938
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Log Type: stdout
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 0
{code}
So it seems that this patch prevents oozie from fullfilling spark classpath correctly with hadoop libraries.
Could you please suggest how to provide spark job with hadoop-configuration.jar. Should it and all the necessary dependencies be placed within the lib directory of the workflow?


was (Author: szhemzhitsky):
Hello [~rkanter], [~rohini], [~satishsaley] I've noticed that the patch from this issue removes **determineSparkJarsAndClasspath** method introduced in OOZIE-2277 by [~rkanter]. 

Currently we are migrating our jobs from [CDH 5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html] without this patch to CDH 5.12 that has this patch applied starting from [CDH 5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html] and it seems that there is a regression, because all of our jobs which use hdfs api internally started to fail with the following error in the oozie launcher logs
{code}
Log Type: stderr
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 938
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Log Type: stdout
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 0
{code}
So it seems that this patch prevents oozie from fullfilling spark classpath correctly with hadoop libraries.
Could you please suggest how to provide spark job with hadoop-configuration.jar. Should it and all the necessary dependencies be placed within the lib directory of the workflow?

> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2547-1.patch, OOZIE-2547-4.patch, OOZIE-2547-5.patch, yarn-cluster_launcher.txt
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which are present in mapreduce.job.cache.files. While doing so, we make sure that spark won't make another copy of the files if files exist on the hdfs. We saw the issues when files are getting copied multiple times and causing exceptions such as :
> {code}
> Diagnostics: Resource hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)