You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/03/07 12:04:51 UTC

How to unit test spark streaming?

Hi All,

How to unit test spark streaming or spark in general? How do I test the
results of my transformations? Also, more importantly don't we need to
spawn master and worker JVM's either in one or multiple nodes?

Thanks!
kant

Re: How to unit test spark streaming?

Posted by kant kodali <ka...@gmail.com>.
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
something really simple. Perhaps the simplest thing to do is just to do
integration tests which also tests the transformations/business logic. This
way I can spawn a small cluster and run my tests and bring my cluster down
when I am done. And sure if the cluster isn't available then I can't run
the tests however some node should be available even to run a single
process. I somehow feel like we may doing too much work to fit into the
archaic definition of unit tests.

 "Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df " this

On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> Basically you abstract your transformations to take in a dataframe and
>> return one, then you assert on the returned df
>>
>
> +1 to this suggestion.  This is why we wanted streaming and batch
> dataframes to share the same API.
>

Re: How to unit test spark streaming?

Posted by Michael Armbrust <mi...@databricks.com>.
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>

+1 to this suggestion.  This is why we wanted streaming and batch
dataframes to share the same API.

Re: How to unit test spark streaming?

Posted by Sam Elamin <hu...@gmail.com>.
Hey kant

You can use holdens spark test base

Have a look at some of the specs I wrote here to give you an idea

https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala

Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df

Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to
> spawn master and worker JVM's either in one or multiple nodes?
>
> Thanks!
> kant
>

Re: How to unit test spark streaming?

Posted by Jörn Franke <jo...@gmail.com>.
This depends on your target setup! I run for example for my open source libraries for spark integration tests (a dedicated folder a side the unit tests) a local spark master, but also use a minidfs cluster (to simulate HDFS on a node) and sometimes also a miniyarn cluster (see https://wiki.apache.org/hadoop/HowToDevelopUnitTests).

 An example can be found here:  https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock 

or - if you need Scala - 
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock 

In both cases it is in the integration-tests (Java) or it (Scala) folder.

Spark Streaming - I have no open source example at hand, but basically you need to simulate the source and the rest is as above.

 I will eventually write a blog post about this with more details.

> On 7 Mar 2017, at 13:04, kant kodali <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> How to unit test spark streaming or spark in general? How do I test the results of my transformations? Also, more importantly don't we need to spawn master and worker JVM's either in one or multiple nodes?
> 
> Thanks!
> kant

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org