You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Enrico Minack (Jira)" <ji...@apache.org> on 2019/12/18 12:22:00 UTC
[jira] [Created] (SPARK-30296) Dataset diffing transformation
Enrico Minack created SPARK-30296:
-------------------------------------
Summary: Dataset diffing transformation
Key: SPARK-30296
URL: https://issues.apache.org/jira/browse/SPARK-30296
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 2.4.4
Reporter: Enrico Minack
Fix For: 3.0.0
Evolving Spark code needs frequent regression testing to prove it still produces identical results, or if changes are expected, to investigate those changes. Diffing the Datasets of two code paths provides confidence.
Diffing small schemata is easy, but with wide schema the Spark query becomes laborious and error-prone. With a single proven and tested method, diffing becomes easier and a more reliable operation. As a Dataset transformation, you get this operation first hand with your Dataset API.
This has proven to be useful for interactive spark as well as deployed production code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org