You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/10/24 22:31:51 UTC
spark inner join
Hi All,
In sql say for example I have table1 (moveid) and table2 (movieid,moviename)
in sql we write something like select moviename ,movieid,count(1) from
table2 inner join table table1 on table1.movieid=table2.moveid group by ....
, here in sql table1 has only one column where as table 2 has two columns
still the join works , same way in spark can join on keys from both the
rdd's ? –
when I tried to join two rdd in spark both the rdd's should have number of
elements for that I need to add a dummy value 0 for example is there other
way around or am I doing completely wrong ?
val
lines=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.data")
val
movienamesfile=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.item")
val moviesid=lines.map(x => x.split("\t")).map(x => (x(1),0))
val test=moviesid.map(x => x._1)
val movienames=movienamesfile.map(x => x.split("\\|")).map(x =>
(x(0),x(1)))
val movienamejoined=moviesid.join(movienames).distinct()
Thanks
Sri
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-inner-join-tp25193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org