You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Evan Chan <ve...@gmail.com> on 2016/04/17 02:47:17 UTC

Using local-cluster mode for testing Spark-related projects

Hey folks,

I'd like to use local-cluster mode in my Spark-related projects to
test Spark functionality in an automated way in a simulated local
cluster.    The idea is to test multi-process things in a much easier
fashion than setting up a real cluster.   However, getting this up and
running in a separate project (I'm using Scala 2.10 and ScalaTest) is
nontrivial.   Does anyone have any suggestions to get up and running?

This is what I've observed so far (I'm testing against 1.5.1, but
suspect this would apply equally to 1.6.x):

- One needs to have a real Spark distro and point to it using SPARK_HOME
- SPARK_SCALA_VERSION needs to be set
- One needs to manually inject jar paths, otherwise dependencies are
missing.  For example, build an assembly jar of all your deps.  Java
class directory hierarchies don't seem to work with the setJars(...).

How does Spark's internal scripts make it possible to run
local-cluster mode and set up all the class paths correctly?   And, is
it possible to mimic this setup for external Spark projects?

thanks,
Evan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Using local-cluster mode for testing Spark-related projects

Posted by Evan Chan <ve...@gmail.com>.
Jon,  Thanks.   I think I've figured it out, actually.   It's really
simple, one needs to simply set spark.executor.extraClassPath to the
current value of the java class path (java.class.path system
property).   Also, to not use HiveContext, which gives errors about
initializing a Derby database multiple times.

On Sun, Apr 17, 2016 at 9:51 AM, Jon Maurer <tr...@gmail.com> wrote:
> Take a look at spark testing base.
> https://github.com/holdenk/spark-testing-base/blob/master/README.md
>
> On Apr 17, 2016 10:28 AM, "Evan Chan" <ve...@gmail.com> wrote:
>>
>> What I want to find out is how to run tests like Spark's with
>> local-cluster, just like that suite, but in your own projects.   Has
>> anyone done this?
>>
>> On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro <li...@gmail.com>
>> wrote:
>> > Hi,
>> > Is this a bad idea to create `SparkContext` with a `local-cluster` mode
>> > by
>> > yourself like
>> >
>> > 'https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55'?
>> >
>> > // maropu
>> >
>> > On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan <ve...@gmail.com>
>> > wrote:
>> >>
>> >> Hey folks,
>> >>
>> >> I'd like to use local-cluster mode in my Spark-related projects to
>> >> test Spark functionality in an automated way in a simulated local
>> >> cluster.    The idea is to test multi-process things in a much easier
>> >> fashion than setting up a real cluster.   However, getting this up and
>> >> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
>> >> nontrivial.   Does anyone have any suggestions to get up and running?
>> >>
>> >> This is what I've observed so far (I'm testing against 1.5.1, but
>> >> suspect this would apply equally to 1.6.x):
>> >>
>> >> - One needs to have a real Spark distro and point to it using
>> >> SPARK_HOME
>> >> - SPARK_SCALA_VERSION needs to be set
>> >> - One needs to manually inject jar paths, otherwise dependencies are
>> >> missing.  For example, build an assembly jar of all your deps.  Java
>> >> class directory hierarchies don't seem to work with the setJars(...).
>> >>
>> >> How does Spark's internal scripts make it possible to run
>> >> local-cluster mode and set up all the class paths correctly?   And, is
>> >> it possible to mimic this setup for external Spark projects?
>> >>
>> >> thanks,
>> >> Evan
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: dev-help@spark.apache.org
>> >>
>> >
>> >
>> >
>> > --
>> > ---
>> > Takeshi Yamamuro
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Using local-cluster mode for testing Spark-related projects

Posted by Jon Maurer <tr...@gmail.com>.
Take a look at spark testing base.
https://github.com/holdenk/spark-testing-base/blob/master/README.md
On Apr 17, 2016 10:28 AM, "Evan Chan" <ve...@gmail.com> wrote:

> What I want to find out is how to run tests like Spark's with
> local-cluster, just like that suite, but in your own projects.   Has
> anyone done this?
>
> On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
> > Hi,
> > Is this a bad idea to create `SparkContext` with a `local-cluster` mode
> by
> > yourself like
> > '
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55
> '?
> >
> > // maropu
> >
> > On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan <ve...@gmail.com>
> wrote:
> >>
> >> Hey folks,
> >>
> >> I'd like to use local-cluster mode in my Spark-related projects to
> >> test Spark functionality in an automated way in a simulated local
> >> cluster.    The idea is to test multi-process things in a much easier
> >> fashion than setting up a real cluster.   However, getting this up and
> >> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
> >> nontrivial.   Does anyone have any suggestions to get up and running?
> >>
> >> This is what I've observed so far (I'm testing against 1.5.1, but
> >> suspect this would apply equally to 1.6.x):
> >>
> >> - One needs to have a real Spark distro and point to it using SPARK_HOME
> >> - SPARK_SCALA_VERSION needs to be set
> >> - One needs to manually inject jar paths, otherwise dependencies are
> >> missing.  For example, build an assembly jar of all your deps.  Java
> >> class directory hierarchies don't seem to work with the setJars(...).
> >>
> >> How does Spark's internal scripts make it possible to run
> >> local-cluster mode and set up all the class paths correctly?   And, is
> >> it possible to mimic this setup for external Spark projects?
> >>
> >> thanks,
> >> Evan
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >
> >
> >
> > --
> > ---
> > Takeshi Yamamuro
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Using local-cluster mode for testing Spark-related projects

Posted by Evan Chan <ve...@gmail.com>.
What I want to find out is how to run tests like Spark's with
local-cluster, just like that suite, but in your own projects.   Has
anyone done this?

On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro <li...@gmail.com> wrote:
> Hi,
> Is this a bad idea to create `SparkContext` with a `local-cluster` mode by
> yourself like
> 'https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55'?
>
> // maropu
>
> On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan <ve...@gmail.com> wrote:
>>
>> Hey folks,
>>
>> I'd like to use local-cluster mode in my Spark-related projects to
>> test Spark functionality in an automated way in a simulated local
>> cluster.    The idea is to test multi-process things in a much easier
>> fashion than setting up a real cluster.   However, getting this up and
>> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
>> nontrivial.   Does anyone have any suggestions to get up and running?
>>
>> This is what I've observed so far (I'm testing against 1.5.1, but
>> suspect this would apply equally to 1.6.x):
>>
>> - One needs to have a real Spark distro and point to it using SPARK_HOME
>> - SPARK_SCALA_VERSION needs to be set
>> - One needs to manually inject jar paths, otherwise dependencies are
>> missing.  For example, build an assembly jar of all your deps.  Java
>> class directory hierarchies don't seem to work with the setJars(...).
>>
>> How does Spark's internal scripts make it possible to run
>> local-cluster mode and set up all the class paths correctly?   And, is
>> it possible to mimic this setup for external Spark projects?
>>
>> thanks,
>> Evan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>
>
>
> --
> ---
> Takeshi Yamamuro

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Using local-cluster mode for testing Spark-related projects

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,
Is this a bad idea to create `SparkContext` with a `local-cluster` mode by
yourself like '
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55
'?

// maropu

On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan <ve...@gmail.com> wrote:

> Hey folks,
>
> I'd like to use local-cluster mode in my Spark-related projects to
> test Spark functionality in an automated way in a simulated local
> cluster.    The idea is to test multi-process things in a much easier
> fashion than setting up a real cluster.   However, getting this up and
> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
> nontrivial.   Does anyone have any suggestions to get up and running?
>
> This is what I've observed so far (I'm testing against 1.5.1, but
> suspect this would apply equally to 1.6.x):
>
> - One needs to have a real Spark distro and point to it using SPARK_HOME
> - SPARK_SCALA_VERSION needs to be set
> - One needs to manually inject jar paths, otherwise dependencies are
> missing.  For example, build an assembly jar of all your deps.  Java
> class directory hierarchies don't seem to work with the setJars(...).
>
> How does Spark's internal scripts make it possible to run
> local-cluster mode and set up all the class paths correctly?   And, is
> it possible to mimic this setup for external Spark projects?
>
> thanks,
> Evan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro