You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Matt Hagy <ma...@liveramp.com> on 2018/10/26 13:31:34 UTC

[PySpark] Sharing testing library and requesting feedback

We recently open sourced mockrdd, a library for testing PySpark code.
github.com/LiveRamp/mockrdd

The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the
following extra benefits.
* Extensive sanity checks to identify invalid inputs
* More meaningful error messages for debugging issues
* Straightforward to running within pdb
* Removes Spark dependencies from development and testing environments
* No Spark overhead when running through a large test suite

More details in this blog post:
liveramp.com/engineering/introducing-mockrdd-for-testing-pyspark-code

Would anyone find this useful? What other features would make this more
useful? Are there benefits to using PySpark in local mode for testing that
we're not considering?

Thanks!