You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yang <te...@gmail.com> on 2016/10/17 16:53:11 UTC

question on the structured DataSet API join

I'm trying to use the joinWith() method instead of join() since the former
provides type checked result while the latter is a straight DataFrame.


the signature is DataSet[(T,U)] joinWith(other:DataSet[U], col:Column)



here the second arg, col:Column is normally provided by
other.col("col_name"). again once we use a string to specify the column,
you can't do compile time type checks (on the validity of the join
condition, for example you could end up specifying
other.col("a_string_col") === this_ds.col("a_double_col") )

I checked the DataSet API doc, seems there is only this col() method
producing a Column, no other ways.

so is there a type-checked way to provide the join condition?


thanks