You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nayan sharma <na...@gmail.com> on 2017/04/17 14:35:19 UTC

isin query

Dataframe (df) having column msrid(String) having values m_123,m_111,m_145,m_098,m_666

I wanted to filter out rows which are having values m_123,m_111,m_145

df.filter($"msrid".isin("m_123","m_111","m_145")).count 
count =0
while 
df.filter($"msrid".isin("m_123")).count 
count=121212
I have tried using queries like 
df.filter($"msrid" isin (List("m_123","m_111","m_145"):_*))
count =0
but 

df.filter($"msrid" isin (List("m_123"):_*))
count=121212

Any suggestion will do a great help to me.

Thanks,
Nayan

Re: isin query

Posted by Koert Kuipers <ko...@tresata.com>.
i dont see this behavior in the current spark master:

scala> val df = Seq("m_123", "m_111", "m_145", "m_098",
"m_666").toDF("msrid")
df: org.apache.spark.sql.DataFrame = [msrid: string]

scala> df.filter($"msrid".isin("m_123")).count
res0: Long =
1

scala> df.filter($"msrid".isin("m_123","m_111","m_145")).count
res1: Long = 3



On Mon, Apr 17, 2017 at 10:50 AM, nayan sharma <na...@gmail.com>
wrote:

> Thanks for responding.
> df.filter($”msrid”===“m_123” || $”msrid”===“m_111”)
>
> there are lots of workaround to my question but Can you let know whats
> wrong with the “isin” query.
>
> Regards,
> Nayan
>
> Begin forwarded message:
>
> *From: *ayan guha <gu...@gmail.com>
> *Subject: **Re: isin query*
> *Date: *17 April 2017 at 8:13:24 PM IST
> *To: *nayan sharma <na...@gmail.com>, user@spark.apache.org
>
> How about using OR operator in filter?
>
> On Tue, 18 Apr 2017 at 12:35 am, nayan sharma <na...@gmail.com>
> wrote:
>
>> Dataframe (df) having column msrid(String) having values
>> m_123,m_111,m_145,m_098,m_666
>>
>> I wanted to filter out rows which are having values m_123,m_111,m_145
>>
>> df.filter($"msrid".isin("m_123","m_111","m_145")).count
>> count =0
>> while
>> df.filter($"msrid".isin("m_123")).count
>> count=121212
>> I have tried using queries like
>> df.filter($"msrid" isin (List("m_123","m_111","m_145"):_*))
>> count =0
>> but
>>
>> df.filter($"msrid" isin (List("m_123"):_*))
>> count=121212
>>
>> Any suggestion will do a great help to me.
>>
>> Thanks,
>> Nayan
>>
> --
> Best Regards,
> Ayan Guha
>
>
>

Fwd: isin query

Posted by nayan sharma <na...@gmail.com>.
Thanks for responding.
df.filter($”msrid”===“m_123” || $”msrid”===“m_111”)

there are lots of workaround to my question but Can you let know whats wrong with the “isin” query.

Regards,
Nayan

> Begin forwarded message:
> 
> From: ayan guha <gu...@gmail.com>
> Subject: Re: isin query
> Date: 17 April 2017 at 8:13:24 PM IST
> To: nayan sharma <na...@gmail.com>, user@spark.apache.org
> 
> How about using OR operator in filter? 
> 
> On Tue, 18 Apr 2017 at 12:35 am, nayan sharma <nayansharma13@gmail.com <ma...@gmail.com>> wrote:
> Dataframe (df) having column msrid(String) having values m_123,m_111,m_145,m_098,m_666
> 
> I wanted to filter out rows which are having values m_123,m_111,m_145
> 
> df.filter($"msrid".isin("m_123","m_111","m_145")).count 
> count =0
> while 
> df.filter($"msrid".isin("m_123")).count 
> count=121212
> I have tried using queries like 
> df.filter($"msrid" isin (List("m_123","m_111","m_145"):_*))
> count =0
> but 
> 
> df.filter($"msrid" isin (List("m_123"):_*))
> count=121212
> 
> Any suggestion will do a great help to me.
> 
> Thanks,
> Nayan
> -- 
> Best Regards,
> Ayan Guha


Re: isin query

Posted by ayan guha <gu...@gmail.com>.
How about using OR operator in filter?

On Tue, 18 Apr 2017 at 12:35 am, nayan sharma <na...@gmail.com>
wrote:

> Dataframe (df) having column msrid(String) having values
> m_123,m_111,m_145,m_098,m_666
>
> I wanted to filter out rows which are having values m_123,m_111,m_145
>
> df.filter($"msrid".isin("m_123","m_111","m_145")).count
> count =0
> while
> df.filter($"msrid".isin("m_123")).count
> count=121212
> I have tried using queries like
> df.filter($"msrid" isin (List("m_123","m_111","m_145"):_*))
> count =0
> but
>
> df.filter($"msrid" isin (List("m_123"):_*))
> count=121212
>
> Any suggestion will do a great help to me.
>
> Thanks,
> Nayan
>
-- 
Best Regards,
Ayan Guha