You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rohit Verma <ro...@rokittech.com> on 2016/10/27 03:22:02 UTC

Cogrouping or joining datasets by rownum

Does anyone tried how to cogroup datasets / join datasets by row num.

e.g
DS 1

43 AA
44 BB
45 CB

DS2

IN india
AU australia


i want to get

rownum   ds1.1 ds1.2   ds2.1 ds2.2

1 43 AA IN india
2 44 BB AU australia
3 45 CB null null

I don’t expect a complete code, some pointers on how to do is sufficient.

I tried row_number function to start

spark.range(100,200).withColumn("id",row_number()).show();

but its throwing error

java.lang.UnsupportedOperationException: Cannot evaluate expression: rownumber()

Thanks
Rohit

Re: Cogrouping or joining datasets by rownum

Posted by Rohit Verma <ro...@rokittech.com>.
The formatting of message got disturbed so sending it again


On Oct 27, 2016, at 8:52 AM, Rohit Verma <ro...@rokittech.com>> wrote:

Does anyone tried how to cogroup datasets / join datasets by row num.


DS1









d1

d2







40

AA







41

BB







42

CC







43

DD

















DS2









s1

s2







IN

INDIA







AU

Australia

















joined









rowNum

d1

d2

s1

s2

1

40

AA

IN

INDIA

2

41

BB

AU

Australia

3

42

CC

null or empty

null or empty

4

43

DD

null or empty

null or empty


I don’t expect a complete code, some pointers on how to do is sufficient.

I tried row_number function to start

spark.range(100,200).withColumn("id",row_number()).show();

but its throwing error

java.lang.UnsupportedOperationException: Cannot evaluate expression: rownumber()

Thanks
Rohit