You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/05/21 00:40:00 UTC
[jira] [Created] (SPARK-27785) Introduce .joinWith() overload for
inner join of 3 or more tables
Josh Rosen created SPARK-27785:
----------------------------------
Summary: Introduce .joinWith() overload for inner join of 3 or more tables
Key: SPARK-27785
URL: https://issues.apache.org/jira/browse/SPARK-27785
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 3.0.0
Reporter: Josh Rosen
Today it's rather painful to do a typed dataset join of more than two tables: {{Dataset[A].joinWith(Dataset[B])}} returns {{Dataset[(A, B)]}} so chaining on a third inner join requires users to specify a complicated join condition (referencing {{_1}}), resulting a doubly-nested schema like {{Dataset[((A, B), C)]}}. Things become even more painful if you want to layer on a fourth join. Using {{.map()}} to flatten the data into {{Dataset[(A, B, C)]}} has a performance penalty, too.
To simplify this use case, I propose to introduce a new set of overloads of {{.joinWith}}, supporting joins of {{N > 2}} tables for {{N}} up to some reasonable number (say, 6). For example:
{code:java}
Dataset[T].joinWith[T1, T2](
ds1: Dataset[T1],
ds2: Dataset[T2]
): Dataset[(T, T1, T2)]
Dataset[T].joinWith[T1, T2](
ds1: Dataset[T1],
ds2: Dataset[T2],
ds3: Dataset[T3]
): Dataset[(T, T1, T2, T3)]{code}
I propose to do this only for inner joins (consistent with the default join type for {{joinWith}} in case joins are not specified).
I haven't though about this too much yet and am not committed to the API proposed above (it's just my initial idea), so I'm open to suggestions for alternative typed APIs for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org