You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Satish Subhashrao Saley (JIRA)" <ji...@apache.org> on 2016/06/06 23:36:21 UTC

[jira] [Comment Edited] (OOZIE-2547) Add mapreduce.job.cache.files to spark action

    [ https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317486#comment-15317486 ] 

Satish Subhashrao Saley edited comment on OOZIE-2547 at 6/6/16 11:36 PM:
-------------------------------------------------------------------------

Hello [~rkanter] and [~rohini], Could you please review the patch?
I have removed the logic behind populating {{spark.executor.extraClassPath}}, {{spark.driver.extraClassPath}}, {{-- jars}} and {{spark.yarn.dist.files}}. Instead of that, now we are adding distributed cached files in {{-- files}}. While doing so, I also make sure that hdfs paths to those files are formulated such that spark won't make another copy. 

I have tested the patch locally as well as in clusters, it seems working fine with {{-- master}} as local,yarn-client and yarn-cluster. 


was (Author: satishsaley):
Hello [~rkanter], Could you please review the patch?
I have removed the logic behind populating {{spark.executor.extraClassPath}}, {{spark.driver.extraClassPath}}, {{-- jars}} and {{spark.yarn.dist.files}}. Instead of that, now we are adding distributed cached files in {{-- files}}. While doing so, I also make sure that hdfs paths to those files are formulated such that spark won't make another copy. 

I have tested the patch locally as well as in clusters, it seems working fine with {{-- master}} as local,yarn-client and yarn-cluster. 

> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>         Attachments: OOZIE-2547-1.patch
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which are present in mapreduce.job.cache.files. While doing so, we make sure that spark won't make another copy of the files if files exist on the hdfs. We saw the issues when files are getting copied multiple times and causing exceptions such as :
> {code}
> Diagnostics: Resource hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)