You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/03/07 12:04:51 UTC
How to unit test spark streaming?
Hi All,
How to unit test spark streaming or spark in general? How do I test the
results of my transformations? Also, more importantly don't we need to
spawn master and worker JVM's either in one or multiple nodes?
Thanks!
kant
Re: How to unit test spark streaming?
Posted by kant kodali <ka...@gmail.com>.
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
something really simple. Perhaps the simplest thing to do is just to do
integration tests which also tests the transformations/business logic. This
way I can spawn a small cluster and run my tests and bring my cluster down
when I am done. And sure if the cluster isn't available then I can't run
the tests however some node should be available even to run a single
process. I somehow feel like we may doing too much work to fit into the
archaic definition of unit tests.
"Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df " this
On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrust <mi...@databricks.com>
wrote:
> Basically you abstract your transformations to take in a dataframe and
>> return one, then you assert on the returned df
>>
>
> +1 to this suggestion. This is why we wanted streaming and batch
> dataframes to share the same API.
>
Re: How to unit test spark streaming?
Posted by Michael Armbrust <mi...@databricks.com>.
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>
+1 to this suggestion. This is why we wanted streaming and batch
dataframes to share the same API.
Re: How to unit test spark streaming?
Posted by Sam Elamin <hu...@gmail.com>.
Hey kant
You can use holdens spark test base
Have a look at some of the specs I wrote here to give you an idea
https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala
Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df
Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali <ka...@gmail.com> wrote:
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to
> spawn master and worker JVM's either in one or multiple nodes?
>
> Thanks!
> kant
>
Re: How to unit test spark streaming?
Posted by Jörn Franke <jo...@gmail.com>.
This depends on your target setup! I run for example for my open source libraries for spark integration tests (a dedicated folder a side the unit tests) a local spark master, but also use a minidfs cluster (to simulate HDFS on a node) and sometimes also a miniyarn cluster (see https://wiki.apache.org/hadoop/HowToDevelopUnitTests).
An example can be found here: https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock
or - if you need Scala -
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock
In both cases it is in the integration-tests (Java) or it (Scala) folder.
Spark Streaming - I have no open source example at hand, but basically you need to simulate the source and the rest is as above.
I will eventually write a blog post about this with more details.
> On 7 Mar 2017, at 13:04, kant kodali <ka...@gmail.com> wrote:
>
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the results of my transformations? Also, more importantly don't we need to spawn master and worker JVM's either in one or multiple nodes?
>
> Thanks!
> kant
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org