You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Swethak Yadav Chandrashekar (JIRA)" <ji...@apache.org> on 2019/01/31 16:46:00 UTC
[jira] [Commented] (LIVY-222) files with local file path fails the job

    [ https://issues.apache.org/jira/browse/LIVY-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757480#comment-16757480 ] 

Swethak Yadav Chandrashekar commented on LIVY-222:
--------------------------------------------------

Was this fixed in any subsequent version ? We tried out 0.5 and the issue still exists

> files with local file path fails the job
> ----------------------------------------
>
>                 Key: LIVY-222
>                 URL: https://issues.apache.org/jira/browse/LIVY-222
>             Project: Livy
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 0.2, 0.3
>            Reporter: Lin Chan
>            Priority: Major
>             Fix For: 0.3
>
>
> To repro the problem:
> # Whitelist some local path using livy.file.local-dir-whitelist.
> # Use yarn-cluster mode
> # Submit a job through livy with files parameter referencing a local file that exists only locally but not on worker nodes.
> The job will fail. This is because SparkContext is trying to find the local file on the driver node. But not the node that running spark-submit.
> Error: 
> {noformat}
> java.io.FileNotFoundException: Added file file:/tmp/a does not exist.
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1388)
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
>     at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
>     at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2305)
>     at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:123)
>     at com.cloudera.livy.repl.SparkInterpreter$$anonfun$start$1.apply(SparkInterpreter.scala:87)
>     at com.cloudera.livy.repl.SparkInterpreter.restoreContextClassLoader(SparkInterpreter.scala:369)
>     at com.cloudera.livy.repl.SparkInterpreter.start(SparkInterpreter.scala:87)
>     at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:63)
>     at com.cloudera.livy.repl.Session$$anonfun$1.apply(Session.scala:61)
>     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We didn't see this problem with Livy 0.1. We found that in 0.2, files parameter isn't mapped to {{--files}} in spark-submit but to SparkConf {{spark.files}}. spark-submit handles local files specified in --files on the spark-submit node. Where {{spark.files}} is handled on the driver node. Hence the difference.
> I did the following experiment to confirm the difference between  {{--files}} and {{spark.files}}.
> First, do a spark-submit directly using --files to reference that additional file. This works fine.
> Then, do a spark-submit using --conf "spark.files=xxx" with the same reference file. This will fail with the same error message in Livy.
> The problem seems to be --conf "spark.files=xxx" is not equivalent to --files in spark-submit, and when user use files parameter in Livy, they would expect it to behave like --files in spark-submit. This needs to be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)