You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Franc Carter <fr...@gmail.com> on 2016/02/21 12:41:10 UTC
filter by dict() key in pySpark
I have a DataFrame that has a Python dict() as one of the columns. I'd like
to filter he DataFrame for those Rows that where the dict() contains a
specific value. e.g something like this:-
DF2 = DF1.filter('name' in DF1.params)
but that gives me this error
ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
for 'or', '~' for 'not' when building DataFrame boolean expressions.
How do I express this correctly ?
thanks
--
Franc
Re: filter by dict() key in pySpark
Posted by Davies Liu <da...@databricks.com>.
Another solution could be using left-semi join:
keys = sqlContext.createDataFrame(dict.keys())
DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi")
On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter <fr...@gmail.com> wrote:
>
> A colleague found how to do this, the approach was to use a udf()
>
> cheers
>
> On 21 February 2016 at 22:41, Franc Carter <fr...@gmail.com> wrote:
>>
>>
>> I have a DataFrame that has a Python dict() as one of the columns. I'd
>> like to filter he DataFrame for those Rows that where the dict() contains a
>> specific value. e.g something like this:-
>>
>> DF2 = DF1.filter('name' in DF1.params)
>>
>> but that gives me this error
>>
>> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
>> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>>
>> How do I express this correctly ?
>>
>> thanks
>>
>> --
>> Franc
>
>
>
>
> --
> Franc
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: filter by dict() key in pySpark
Posted by Franc Carter <fr...@gmail.com>.
A colleague found how to do this, the approach was to use a udf()
cheers
On 21 February 2016 at 22:41, Franc Carter <fr...@gmail.com> wrote:
>
> I have a DataFrame that has a Python dict() as one of the columns. I'd
> like to filter he DataFrame for those Rows that where the dict() contains a
> specific value. e.g something like this:-
>
> DF2 = DF1.filter('name' in DF1.params)
>
> but that gives me this error
>
> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
> for 'or', '~' for 'not' when building DataFrame boolean expressions.
>
> How do I express this correctly ?
>
> thanks
>
> --
> Franc
>
--
Franc