You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Franc Carter <fr...@gmail.com> on 2016/02/21 12:41:10 UTC

filter by dict() key in pySpark

I have a DataFrame that has a Python dict() as one of the columns. I'd like
to filter he DataFrame for those Rows that where the dict() contains a
specific value. e.g something like this:-

    DF2 = DF1.filter('name' in DF1.params)

but that gives me this error

ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
for 'or', '~' for 'not' when building DataFrame boolean expressions.

How do I express this correctly ?

thanks

-- 
Franc

Re: filter by dict() key in pySpark

Posted by Davies Liu <da...@databricks.com>.
Another solution could be using left-semi join:

keys = sqlContext.createDataFrame(dict.keys())
DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi")

On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter <fr...@gmail.com> wrote:
>
> A colleague found how to do this, the approach was to use a udf()
>
> cheers
>
> On 21 February 2016 at 22:41, Franc Carter <fr...@gmail.com> wrote:
>>
>>
>> I have a DataFrame that has a Python dict() as one of the columns. I'd
>> like to filter he DataFrame for those Rows that where the dict() contains a
>> specific value. e.g something like this:-
>>
>>     DF2 = DF1.filter('name' in DF1.params)
>>
>> but that gives me this error
>>
>> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
>> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>>
>> How do I express this correctly ?
>>
>> thanks
>>
>> --
>> Franc
>
>
>
>
> --
> Franc

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: filter by dict() key in pySpark

Posted by Franc Carter <fr...@gmail.com>.
A colleague found how to do this, the approach was to use a udf()

cheers

On 21 February 2016 at 22:41, Franc Carter <fr...@gmail.com> wrote:

>
> I have a DataFrame that has a Python dict() as one of the columns. I'd
> like to filter he DataFrame for those Rows that where the dict() contains a
> specific value. e.g something like this:-
>
>     DF2 = DF1.filter('name' in DF1.params)
>
> but that gives me this error
>
> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>
> How do I express this correctly ?
>
> thanks
>
> --
> Franc
>



-- 
Franc