You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/10/24 22:31:51 UTC

spark inner join

Hi All, 

In sql say for example I have table1 (moveid) and table2 (movieid,moviename)
in sql we write something like select moviename ,movieid,count(1) from
table2 inner join table table1 on table1.movieid=table2.moveid group by ....
, here in sql table1 has only one column where as table 2 has two columns
still the join works , same way in spark can join on keys from both the
rdd's ? –

when I tried to join two rdd in spark both the rdd's should have number of
elements for that I need to add a dummy value 0 for example is there other
way around or am I doing completely wrong ?

val
lines=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.data")
    val
movienamesfile=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.item")
    val moviesid=lines.map(x => x.split("\t")).map(x => (x(1),0))
    val test=moviesid.map(x => x._1)
    val movienames=movienamesfile.map(x => x.split("\\|")).map(x =>
(x(0),x(1)))
    val movienamejoined=moviesid.join(movienames).distinct()

Thanks
Sri



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-inner-join-tp25193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org