You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marcin Kuthan <ma...@gmail.com> on 2015/03/01 10:13:49 UTC

Spark Streaming testing strategies

I have started using Spark and Spark Streaming and I'm wondering how do you
test your applications? Especially Spark Streaming application with window
based transformations.

After some digging I found ManualClock class to take full control over
stream processing. Unfortunately the class is not available outside
spark.streaming package. Are you going to expose the class for other
developers as well? Now I have to use my custom wrapper under
spark.streaming package.

My Spark and Spark Streaming unit tests strategies are documented here:
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/

Your feedback is more than appreciated.

Marcin

Re: Spark Streaming testing strategies

Posted by Marcin Kuthan <ma...@gmail.com>.
>>
>> I would expect base trait for testing purposes in spark distribution.
>> ManualClock should be exposed as well. And some documentation how to
>> configure SBT to avoid problems with multiple spark contexts. I'm
>> going to create improvement proposal on Spark issue tracker about it.
>
> Right now I think a package is probably a good place for this to live since
> the internal Spark testing code is changing/evolving rapidly, but I think
> once we have the trait fleshed out a bit more we could see if there is
> enough interest to try and merge it in (just my personal thoughts).
>

You are right, let's wait for community feedback.

I'm quite new in Spark but it is strange for me that good support for
isolated testing is not first class citizen. In the Spark Programming
Guide the chapter about unit testing is not more than three sentences
:-(

Spring Framework taught me how important is the ability to run tests
from your IDE, and how important is tests execution performance (there
are also problems with parallelism under single JVM).

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming testing strategies

Posted by Holden Karau <ho...@pigscanfly.ca>.
On Tue, Mar 10, 2015 at 1:18 PM, Marcin Kuthan <ma...@gmail.com>
wrote:

> Hi Holden
>
> Thanks Holden for pointing me the package. Indeed StreamingSuiteBase
> trait hides a lot, especially regarding clock manipulation. Did you
> encounter problems with concurrent tests execution from SBT
> (SPARK-2243)? I had to disable parallel execution and configure SBT to
> use separate JVM for tests execution (fork).
>
Yah, I haven't used parallel execution with this testing trait, I can look
into it some more.

>
> BTW. I added samples for SparkSQL as well.
>
Oh awesome :)

>
> I would expect base trait for testing purposes in spark distribution.
> ManualClock should be exposed as well. And some documentation how to
> configure SBT to avoid problems with multiple spark contexts. I'm
> going to create improvement proposal on Spark issue tracker about it.
>
Right now I think a package is probably a good place for this to live since
the internal Spark testing code is changing/evolving rapidly, but I think
once we have the trait fleshed out a bit more we could see if there is
enough interest to try and merge it in (just my personal thoughts).


>
>
>
> On 1 March 2015 at 18:49, Holden Karau <ho...@pigscanfly.ca> wrote:
> >
> > There is also the Spark Testing Base package which is on
> spark-packages.org and hides the ugly bits (it's based on the existing
> streaming test code but I cleaned it up a bit to try and limit the number
> of internals it was touching).
> >
> >
> > On Sunday, March 1, 2015, Marcin Kuthan <ma...@gmail.com> wrote:
> >>
> >> I have started using Spark and Spark Streaming and I'm wondering how do
> you test your applications? Especially Spark Streaming application with
> window based transformations.
> >>
> >> After some digging I found ManualClock class to take full control over
> stream processing. Unfortunately the class is not available outside
> spark.streaming package. Are you going to expose the class for other
> developers as well? Now I have to use my custom wrapper under
> spark.streaming package.
> >>
> >> My Spark and Spark Streaming unit tests strategies are documented here:
> >> http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
> >>
> >> Your feedback is more than appreciated.
> >>
> >> Marcin
> >>
> >
> >
> > --
> > Cell : 425-233-8271
>



-- 
Cell : 425-233-8271

Re: Spark Streaming testing strategies

Posted by Marcin Kuthan <ma...@gmail.com>.
Hi Holden

Thanks Holden for pointing me the package. Indeed StreamingSuiteBase
trait hides a lot, especially regarding clock manipulation. Did you
encounter problems with concurrent tests execution from SBT
(SPARK-2243)? I had to disable parallel execution and configure SBT to
use separate JVM for tests execution (fork).

BTW. I added samples for SparkSQL as well.

I would expect base trait for testing purposes in spark distribution.
ManualClock should be exposed as well. And some documentation how to
configure SBT to avoid problems with multiple spark contexts. I'm
going to create improvement proposal on Spark issue tracker about it.



On 1 March 2015 at 18:49, Holden Karau <ho...@pigscanfly.ca> wrote:
>
> There is also the Spark Testing Base package which is on spark-packages.org and hides the ugly bits (it's based on the existing streaming test code but I cleaned it up a bit to try and limit the number of internals it was touching).
>
>
> On Sunday, March 1, 2015, Marcin Kuthan <ma...@gmail.com> wrote:
>>
>> I have started using Spark and Spark Streaming and I'm wondering how do you test your applications? Especially Spark Streaming application with window based transformations.
>>
>> After some digging I found ManualClock class to take full control over stream processing. Unfortunately the class is not available outside spark.streaming package. Are you going to expose the class for other developers as well? Now I have to use my custom wrapper under spark.streaming package.
>>
>> My Spark and Spark Streaming unit tests strategies are documented here:
>> http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
>>
>> Your feedback is more than appreciated.
>>
>> Marcin
>>
>
>
> --
> Cell : 425-233-8271

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming testing strategies

Posted by Holden Karau <ho...@pigscanfly.ca>.
There is also the Spark Testing Base package which is on spark-packages.org
and hides the ugly bits (it's based on the existing streaming test code but
I cleaned it up a bit to try and limit the number of internals it was
touching).

On Sunday, March 1, 2015, Marcin Kuthan <ma...@gmail.com> wrote:

> I have started using Spark and Spark Streaming and I'm wondering how do
> you test your applications? Especially Spark Streaming application with
> window based transformations.
>
> After some digging I found ManualClock class to take full control over
> stream processing. Unfortunately the class is not available outside
> spark.streaming package. Are you going to expose the class for other
> developers as well? Now I have to use my custom wrapper under
> spark.streaming package.
>
> My Spark and Spark Streaming unit tests strategies are documented here:
> http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
>
> Your feedback is more than appreciated.
>
> Marcin
>
>

-- 
Cell : 425-233-8271