You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yunjie Ji <jy...@163.com> on 2017/02/28 01:18:33 UTC

Run spark machine learning example on Yarn failed

 After start the dfs, yarn and spark, I run these code under the root
directory of spark on my master host: 
`MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
data/mllib/sample_libsvm_data.txt`

Actually I get these code from spark's README. And here is the source code
about LogisticRegressionExample on GitHub: 
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala>  

Then, error occurs: 
`Exception in thread "main" org.apache.spark.sql.AnalysisException: Path
does notexist:
hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;`

Firstly, I don't know why it's `hdfs://master:9000/user/root`, I do set
namenode's IP address to `hdfs://master:9000`, but why spark chose the
directory `/user/root`?

Then, I make a directory `/user/root/data/mllib/sample_libsvm_data.txt` on
every host of the cluster, so I hope spark can find this file. But the same
error occurs again. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Run-spark-machine-learning-example-on-Yarn-failed-tp28435.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Run spark machine learning example on Yarn failed

Posted by Femi Anthony <fe...@gmail.com>.

Have you tried specifying an absolute instead of a relative path ?

Femi



> On Feb 27, 2017, at 8:18 PM, Yunjie Ji <jy...@163.com> wrote:
> 
> After start the dfs, yarn and spark, I run these code under the root
> directory of spark on my master host: 
> `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
> data/mllib/sample_libsvm_data.txt`
> 
> Actually I get these code from spark's README. And here is the source code
> about LogisticRegressionExample on GitHub: 
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala
> <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala>  
> 
> Then, error occurs: 
> `Exception in thread "main" org.apache.spark.sql.AnalysisException: Path
> does notexist:
> hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;`
> 
> Firstly, I don't know why it's `hdfs://master:9000/user/root`, I do set
> namenode's IP address to `hdfs://master:9000`, but why spark chose the
> directory `/user/root`?
> 
> Then, I make a directory `/user/root/data/mllib/sample_libsvm_data.txt` on
> every host of the cluster, so I hope spark can find this file. But the same
> error occurs again. 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Run-spark-machine-learning-example-on-Yarn-failed-tp28435.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Run spark machine learning example on Yarn failed

Posted by Marco Mistroni <mm...@gmail.com>.

Or place the file in s3 and provide the s3 path
Kr

On 28 Feb 2017 1:18 am, "Yunjie Ji" <jy...@163.com> wrote:

>  After start the dfs, yarn and spark, I run these code under the root
> directory of spark on my master host:
> `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
> data/mllib/sample_libsvm_data.txt`
>
> Actually I get these code from spark's README. And here is the source code
> about LogisticRegressionExample on GitHub:
> https://github.com/apache/spark/blob/master/examples/
> src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.
> scala
> <https://github.com/apache/spark/blob/master/examples/
> src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.
> scala>
>
> Then, error occurs:
> `Exception in thread "main" org.apache.spark.sql.AnalysisException: Path
> does notexist:
> hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;`
>
> Firstly, I don't know why it's `hdfs://master:9000/user/root`, I do set
> namenode's IP address to `hdfs://master:9000`, but why spark chose the
> directory `/user/root`?
>
> Then, I make a directory `/user/root/data/mllib/sample_libsvm_data.txt` on
> every host of the cluster, so I hope spark can find this file. But the same
> error occurs again.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Run-spark-machine-learning-example-on-Yarn-failed-
> tp28435.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Run spark machine learning example on Yarn failed

Posted by Jörn Franke <jo...@gmail.com>.

You do not need to place it in every local directory of every node. Just use hadoop fs -put to put it on HDFS. Alternatively as others suggested use s3

> On 28 Feb 2017, at 02:18, Yunjie Ji <jy...@163.com> wrote:
> 
> After start the dfs, yarn and spark, I run these code under the root
> directory of spark on my master host: 
> `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
> data/mllib/sample_libsvm_data.txt`
> 
> Actually I get these code from spark's README. And here is the source code
> about LogisticRegressionExample on GitHub: 
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala
> <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala>  
> 
> Then, error occurs: 
> `Exception in thread "main" org.apache.spark.sql.AnalysisException: Path
> does notexist:
> hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;`
> 
> Firstly, I don't know why it's `hdfs://master:9000/user/root`, I do set
> namenode's IP address to `hdfs://master:9000`, but why spark chose the
> directory `/user/root`?
> 
> Then, I make a directory `/user/root/data/mllib/sample_libsvm_data.txt` on
> every host of the cluster, so I hope spark can find this file. But the same
> error occurs again. 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Run-spark-machine-learning-example-on-Yarn-failed-tp28435.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org