You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jamie Hutton (JIRA)" <ji...@apache.org> on 2016/10/11 12:46:20 UTC
[jira] [Created] (SPARK-17871) Dataset joinwith syntax should
support specifying the condition in a compile-time safe way
Jamie Hutton created SPARK-17871:
------------------------------------
Summary: Dataset joinwith syntax should support specifying the condition in a compile-time safe way
Key: SPARK-17871
URL: https://issues.apache.org/jira/browse/SPARK-17871
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.0.1
Reporter: Jamie Hutton
One of the great things about datasets is the way it enables compile time type-safety. However the joinWith method currently only supports a "condition" which is specified as a column, meaning we have to reference the columns by name, removing compile time checking and leading to errors
It would be great if spark could support a join method which allowed compile-time checking of the condition. Something like:
leftDS.joinWith(rightDS, case(l,r)=>l.id=r.id"))
This would have the added benefit of solving a serialization issue which stops joinwith working when using kryo (because with kryo we just one one column on binary called "value" representing the entire object, and we need to be able to join on items within the object - more info here: http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org