You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Swethak Yadav Chandrashekar (JIRA)" <ji...@apache.org> on 2019/01/31 16:46:00 UTC
[jira] [Commented] (LIVY-222) files with local file path fails the
job
[ https://issues.apache.org/jira/browse/LIVY-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757480#comment-16757480 ]
Swethak Yadav Chandrashekar commented on LIVY-222:
--------------------------------------------------
Was this fixed in any subsequent version ? We tried out 0.5 and the issue still exists
> files with local file path fails the job
> ----------------------------------------
>
> Key: LIVY-222
> URL: https://issues.apache.org/jira/browse/LIVY-222
> Project: Livy
> Issue Type: Bug
> Components: Server
> Affects Versions: 0.2, 0.3
> Reporter: Lin Chan
> Priority: Major
> Fix For: 0.3
>
>
> To repro the problem:
> # Whitelist some local path using livy.file.local-dir-whitelist.
> # Use yarn-cluster mode
> # Submit a job through livy with files parameter referencing a local file that exists only locally but not on worker nodes.
> The job will fail. This is because SparkContext is trying to find the local file on the driver node. But not the node that running spark-submit.
> Error:
> {noformat}
> java.io.FileNotFoundException: Added file file:/tmp/a does not exist.
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1388)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
> at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
> at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
> at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2305)
> at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:123)
> at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:87)
> at com.cloudera.livy.repl.SparkInterpreter.restoreContextClassLoader(SparkInterpreter.scala:369)
> at com.cloudera.livy.repl.SparkInterpreter.start(SparkInterpreter.scala:87)
> at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:63)
> at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:61)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We didn't see this problem with Livy 0.1. We found that in 0.2, files parameter isn't mapped to {{--files}} in spark-submit but to SparkConf {{spark.files}}. spark-submit handles local files specified in --files on the spark-submit node. Where {{spark.files}} is handled on the driver node. Hence the difference.
> I did the following experiment to confirm the difference between {{--files}} and {{spark.files}}.
> First, do a spark-submit directly using --files to reference that additional file. This works fine.
> Then, do a spark-submit using --conf "spark.files=xxx" with the same reference file. This will fail with the same error message in Livy.
> The problem seems to be --conf "spark.files=xxx" is not equivalent to --files in spark-submit, and when user use files parameter in Livy, they would expect it to behave like --files in spark-submit. This needs to be fixed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)