You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dana Tontea <dt...@cylex.ro> on 2014/01/24 11:53:32 UTC

What I am missing from configuration?

I am completely new to Spark. 
I want to run the exemples from here:   
https://spark.incubator.apache.org/docs/0.8.1/quick-start.html
<https://spark.incubator.apache.org/docs/0.8.1/quick-start.html>   from
section "A Standalone App in Scala".
When I run local with type of scheduler= local scheduler
     val sc = new SparkContext("local[2]", "Simple App",
"/home/*spark-0.8.1-incubating-bin-cdh4*",
                    List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
I get the result ok. But when I replace with url master from webUI
(spark://192.168.6.66:7077)
      val sc = new SparkContext("spark://192.168.6.66:7077", "Simple
App","/home/*spark-0.8.1-incubating-bin-cdh4*",
List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
I get a long error:
Starting task 0.0:1 as TID 6 on executor 0: ro-mysql5.cylex.local
(PROCESS_LOCAL)
14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1
as 1801 bytes in 1 ms
14/01/23 17:02:48 WARN cluster.ClusterTaskSetManager: Lost TID 5 (task
0.0:0)
14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Loss was due to
java.lang.OutOfMemoryError: Java heap space [duplicate 5]
The entire log error you can find in atached file. Error
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/Error>  

Can somebody explain what I am missing and what's the differences from these
2 schedulers: local[2] and spark://192.168.6.66:7077 ?  Why I can not see in
webUI (http://localhost:8080/) the job when run with local[2].
Here  SimpleJob.scala
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/SimpleJob.scala>  
are the code from scala and sbt  simple.sbt
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/simple.sbt> 
.

 And can please somebody to show me where I can find  a step-by-step
tutorial or a course about how setup correctly a cluster and how acces it
from IDE :IntelliJ IDEA. 
Thanks in advanced!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: What I am missing from configuration?

Posted by Dana Tontea <dt...@cylex.ro>.

    Hello Mark,
 Firstly sorry for my poor details about my version of Spark. So in my
cluster I have CDH 4.5 (Hadoop 2.0.0) installed and now finally I installed
succesfully the latest release  from Spark :  spark-0.9.0-incubating
<http://d3kbcqa49mib13.cloudfront.net/spark-0.9.0-incubating.tgz>  .

You are right I must to build locally the spark, but on each node from the
cluster with my exactly Hadoop version: 
SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 sbt/sbt assembly publish-local 

Thanks for your help,



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1387.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: What I am missing from configuration?

Posted by Mark Hamstra <ma...@clearstorydata.com>.

What do you mean by "the last version of spark-0.9.0"?  To be precise,
there isn't anything known as spark-0.9.0.  What was released recently is
spark-0.9.0-incubating, and there is and only ever will be one version of
that.  If you're talking about a 0.9.0-incubating-SNAPSHOT built locally,
then you're going to have to specify a commit number for us to know just
what you've built -- that's the basic, floating nature of SNAPSHOTs, and it
is even more true right now because the master branch of Spark currently
says that it is building 0.9.0-incubating-SNAPSHOT when it should be
1.0.0-incubating-SNAPSHOT.

If you're not building Spark locally, then it is a matter of getting the
right resolver set in simple.sbt.  If you are re-building Spark (e.g. to
change the Hadoop version), then make sure that you are doing `sbt/sbt
publish-local` after your build to put your newly-built artifacts into your
.ivy2 cache where other sbt projects can find it.

On Wed, Feb 5, 2014 at 10:40 AM, Dana Tontea <dt...@cylex.ro> wrote:

>    Hi Matei,
>
> Firstly thank you a lot for answer.You are right I'm missing on local the
> hadoop-client dependency.
> But in my cluster I deployed the last version of spark-0.9.0 and now on
> same
> code I get the next error to sbt package:
>
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]  ::          UNRESOLVED DEPENDENCIES         ::
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [error]
>
> {file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update:
> sbt.ResolveException: unresolved dependency:
> org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
> [error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
> I don't know what  I am missing again...
> My scala -version is:
> Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
>
> Thanks in advanced!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1246.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: What I am missing from configuration?

Posted by Andrew Ash <an...@andrewash.com>.

Try depending on spark-core_2.10 rather than 2.10.3 -- the third digit was
dropped in the maven artifact and I hit this just yesterday as well.

Sent from my mobile phone
On Feb 5, 2014 10:41 AM, "Dana Tontea" <dt...@cylex.ro> wrote:

>    Hi Matei,
>
> Firstly thank you a lot for answer.You are right I'm missing on local the
> hadoop-client dependency.
> But in my cluster I deployed the last version of spark-0.9.0 and now on
> same
> code I get the next error to sbt package:
>
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]  ::          UNRESOLVED DEPENDENCIES         ::
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
> [warn]  ::::::::::::::::::::::::::::::::::::::::::::::
> [error]
>
> {file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update:
> sbt.ResolveException: unresolved dependency:
> org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
> [error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
> I don't know what  I am missing again...
> My scala -version is:
> Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
>
> Thanks in advanced!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1246.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: What I am missing from configuration?

Posted by Dana Tontea <dt...@cylex.ro>.

   Hi Matei,

Firstly thank you a lot for answer.You are right I'm missing on local the
hadoop-client dependency.
But in my cluster I deployed the last version of spark-0.9.0 and now on same
code I get the next error to sbt package:

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[error]
{file:/root/workspace_Spark/scala%20standalone%20app/}default-2327b2/*:update:
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-core_2.10.3;0.9.0-incubating: not found
[error] Total time: 12 s, completed Feb 5, 2014 8:12:25 PM
I don't know what  I am missing again...
My scala -version is: 
Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL

Thanks in advanced! 




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878p1246.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: What I am missing from configuration?

Posted by Matei Zaharia <ma...@gmail.com>.

Hi Dana,

I think the problem is that your simple.sbt does not add a dependency on hadoop-client for CDH4, so you get a different version of the Hadoop library on your driver application compared to the cluster. Try adding a dependency on hadoop-client version 2.0.0-mr1-cdh4.X.X for your version of CDH4, as well as the following line to add the resolver:

resolvers += "Cloudera Repository"  at "https://repository.cloudera.com/artifactory/cloudera-repos/“

Matei

On Jan 24, 2014, at 2:53 AM, Dana Tontea <dt...@cylex.ro> wrote:

> I am completely new to Spark. 
> I want to run the exemples from here:   
> https://spark.incubator.apache.org/docs/0.8.1/quick-start.html
> <https://spark.incubator.apache.org/docs/0.8.1/quick-start.html>   from
> section "A Standalone App in Scala".
> When I run local with type of scheduler= local scheduler
>     val sc = new SparkContext("local[2]", "Simple App",
> "/home/*spark-0.8.1-incubating-bin-cdh4*",
>                    List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
> I get the result ok. But when I replace with url master from webUI
> (spark://192.168.6.66:7077)
>      val sc = new SparkContext("spark://192.168.6.66:7077", "Simple
> App","/home/*spark-0.8.1-incubating-bin-cdh4*",
> List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
> I get a long error:
> Starting task 0.0:1 as TID 6 on executor 0: ro-mysql5.cylex.local
> (PROCESS_LOCAL)
> 14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1
> as 1801 bytes in 1 ms
> 14/01/23 17:02:48 WARN cluster.ClusterTaskSetManager: Lost TID 5 (task
> 0.0:0)
> 14/01/23 17:02:48 INFO cluster.ClusterTaskSetManager: Loss was due to
> java.lang.OutOfMemoryError: Java heap space [duplicate 5]
> The entire log error you can find in atached file. Error
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/Error>  
> 
> Can somebody explain what I am missing and what's the differences from these
> 2 schedulers: local[2] and spark://192.168.6.66:7077 ?  Why I can not see in
> webUI (http://localhost:8080/) the job when run with local[2].
> Here  SimpleJob.scala
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/SimpleJob.scala>  
> are the code from scala and sbt  simple.sbt
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n878/simple.sbt> 
> .
> 
> And can please somebody to show me where I can find  a step-by-step
> tutorial or a course about how setup correctly a cluster and how acces it
> from IDE :IntelliJ IDEA. 
> Thanks in advanced!
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-I-am-missing-from-configuration-tp878.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.