You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/01/10 03:52:00 UTC

[jira] [Updated] (SPARK-30296) Dataset diffing transformation

     [ https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-30296:
----------------------------------
    Fix Version/s:     (was: 3.0.0)

> Dataset diffing transformation
> ------------------------------
>
>                 Key: SPARK-30296
>                 URL: https://issues.apache.org/jira/browse/SPARK-30296
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Enrico Minack
>            Priority: Major
>
> Evolving Spark code needs frequent regression testing to prove it still produces identical results, or if changes are expected, to investigate those changes. Diffing the Datasets of two code paths provides confidence.
> Diffing small schemata is easy, but with wide schema the Spark query becomes laborious and error-prone. With a single proven and tested method, diffing becomes easier and a more reliable operation. As a Dataset transformation, you get this operation first hand with your Dataset API.
> This has proven to be useful for interactive spark as well as deployed production code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org