You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rohit Verma <ro...@rokittech.com> on 2016/10/27 03:22:02 UTC
Cogrouping or joining datasets by rownum
Does anyone tried how to cogroup datasets / join datasets by row num.
e.g
DS 1
43 AA
44 BB
45 CB
DS2
IN india
AU australia
i want to get
rownum ds1.1 ds1.2 ds2.1 ds2.2
1 43 AA IN india
2 44 BB AU australia
3 45 CB null null
I don’t expect a complete code, some pointers on how to do is sufficient.
I tried row_number function to start
spark.range(100,200).withColumn("id",row_number()).show();
but its throwing error
java.lang.UnsupportedOperationException: Cannot evaluate expression: rownumber()
Thanks
Rohit
Re: Cogrouping or joining datasets by rownum
Posted by Rohit Verma <ro...@rokittech.com>.
The formatting of message got disturbed so sending it again
On Oct 27, 2016, at 8:52 AM, Rohit Verma <ro...@rokittech.com>> wrote:
Does anyone tried how to cogroup datasets / join datasets by row num.
DS1
d1
d2
40
AA
41
BB
42
CC
43
DD
DS2
s1
s2
IN
INDIA
AU
Australia
joined
rowNum
d1
d2
s1
s2
1
40
AA
IN
INDIA
2
41
BB
AU
Australia
3
42
CC
null or empty
null or empty
4
43
DD
null or empty
null or empty
I don’t expect a complete code, some pointers on how to do is sufficient.
I tried row_number function to start
spark.range(100,200).withColumn("id",row_number()).show();
but its throwing error
java.lang.UnsupportedOperationException: Cannot evaluate expression: rownumber()
Thanks
Rohit