You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yogesh Vyas <in...@gmail.com> on 2016/10/03 07:08:23 UTC

Filtering in SparkR

Hi,

I have two SparkDataFrames, df1 and df2.
There schemas are as follows:
df1=>SparkDataFrame[id:double, c1:string, c2:string]
df2=>SparkDataFrame[id:double, c3:string, c4:string]

I want to filter out rows from df1 where df1$id does not match df2$id

I tried some expression: filter(df1,!(df1$id %in% df2$id)), but it does not
works.

Anybody could please provide me a solution for it?

Regards,
Yogesh

Re: Filtering in SparkR

Posted by Deepak Sharma <de...@gmail.com>.
Hi Yogesh
You can try registering these 2 DFs as temporary table and then execute the
sql query.
df1.registerTempTable("df1")
df2.registerTempTable("df2")

val rs = sqlContext.sql("SELECT a.* FROM df1 a, df2 b where a.id != b.id)

Thanks
Deepak

On Mon, Oct 3, 2016 at 12:38 PM, Yogesh Vyas <in...@gmail.com> wrote:

> Hi,
>
> I have two SparkDataFrames, df1 and df2.
> There schemas are as follows:
> df1=>SparkDataFrame[id:double, c1:string, c2:string]
> df2=>SparkDataFrame[id:double, c3:string, c4:string]
>
> I want to filter out rows from df1 where df1$id does not match df2$id
>
> I tried some expression: filter(df1,!(df1$id %in% df2$id)), but it does
> not works.
>
> Anybody could please provide me a solution for it?
>
> Regards,
> Yogesh
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net