You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Enrico Minack (Jira)" <ji...@apache.org> on 2019/12/18 12:22:00 UTC

[jira] [Created] (SPARK-30296) Dataset diffing transformation

Enrico Minack created SPARK-30296:
-------------------------------------

             Summary: Dataset diffing transformation
                 Key: SPARK-30296
                 URL: https://issues.apache.org/jira/browse/SPARK-30296
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Enrico Minack
             Fix For: 3.0.0


Evolving Spark code needs frequent regression testing to prove it still produces identical results, or if changes are expected, to investigate those changes. Diffing the Datasets of two code paths provides confidence.

Diffing small schemata is easy, but with wide schema the Spark query becomes laborious and error-prone. With a single proven and tested method, diffing becomes easier and a more reliable operation. As a Dataset transformation, you get this operation first hand with your Dataset API.

This has proven to be useful for interactive spark as well as deployed production code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org