You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rahul Nandi <ra...@gmail.com> on 2019/09/03 15:04:29 UTC

Unit testing PySpark Code and doing assertion

Hi,
I'm trying to do unit testing of my pyspark DataFrame code. My goal is to
do an assertion on the schema and data of the DataFrames. I'm looking for
options if there are any known libraries that I can use for doing the same.
Any library which can work on 10-15 records in the DataFrame is good for
me.
As of now I'm using unittest library and using *assertCountEquals* method
to do the assertion. This is quite okay, but it does not do the schema
level validation. The failure message is not easily understandable.

If any of you are using any special techniques, let me know. Thanks
in advance.

Regards,
Rahul