You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:35:08 UTC

[jira] [Resolved] (SPARK-17871) Dataset joinwith syntax should support specifying the condition in a compile-time safe way

     [ https://issues.apache.org/jira/browse/SPARK-17871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-17871.
----------------------------------
    Resolution: Incomplete

> Dataset joinwith syntax should support specifying the condition in a compile-time safe way
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17871
>                 URL: https://issues.apache.org/jira/browse/SPARK-17871
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.1
>            Reporter: Jamie Hutton
>            Priority: Major
>              Labels: bulk-closed
>
> One of the great things about datasets is the way it enables compile time type-safety. However the joinWith method currently only supports a "condition" which is specified as a column, meaning we have to reference the columns by name, removing compile time checking and leading to errors
> It would be great if spark could support a join method which allowed compile-time checking of the condition. Something like:
> leftDS.joinWith(rightDS, case(l,r)=>l.id=r.id"))
> This would have the added benefit of solving a serialization issue which stops joinwith working when using kryo (because with kryo we just one one column on binary called "value" representing the entire object, and we need to be able to join on items within the object - more info here: http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org