You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mehdi Ben Haj Abbes <me...@gmail.com> on 2016/02/18 16:20:36 UTC

equalTo isin not working as expected with a constructed column with DataFrames

Hi folks,

I have DataFrame with let's say this schema :
-dealId,
-ptf,
-ts
from it I derive another dataframe (lets call it df) to which I add an
extra column (withColumn) which is the concatenation of the 3 existing
columns and I call it (the new column) "theone"

When I print the schema for the new dataframe "theone" column has a String
type. And when I do
df.where(df.col("theone").equalTo("nonExistantValue")).toJavaRDD.count well
I get the initial size of df as if the filtring did not work. but If I do
the same query but filtring on one of the original columns I get what is
expected as count which 0

The same goes for isin

Any help will be more than appreciated.

Best regards,


-- 
Mehdi BEN HAJ ABBES

Re: equalTo isin not working as expected with a constructed column with DataFrames

Posted by Michael Armbrust <mi...@databricks.com>.
Can you include the output of explain(true) on the dataframe in question.
It would also be really helpful to see a small code fragment that
reproduces the issue.

On Thu, Feb 18, 2016 at 9:10 AM, Mehdi Ben Haj Abbes <me...@gmail.com>
wrote:

> Hi,
> I forgot to mention that I'm using the 1.5.1 version.
> Regards,
>
> On Thu, Feb 18, 2016 at 4:20 PM, Mehdi Ben Haj Abbes <
> mehdi.abbes@gmail.com> wrote:
>
>> Hi folks,
>>
>> I have DataFrame with let's say this schema :
>> -dealId,
>> -ptf,
>> -ts
>> from it I derive another dataframe (lets call it df) to which I add an
>> extra column (withColumn) which is the concatenation of the 3 existing
>> columns and I call it (the new column) "theone"
>>
>> When I print the schema for the new dataframe "theone" column has a
>> String type. And when I do
>> df.where(df.col("theone").equalTo("nonExistantValue")).toJavaRDD.count well
>> I get the initial size of df as if the filtring did not work. but If I do
>> the same query but filtring on one of the original columns I get what is
>> expected as count which 0
>>
>> The same goes for isin
>>
>> Any help will be more than appreciated.
>>
>> Best regards,
>>
>>
>> --
>> Mehdi BEN HAJ ABBES
>>
>>
>
>
> --
> Mehdi BEN HAJ ABBES
>
>

Re: equalTo isin not working as expected with a constructed column with DataFrames

Posted by Mehdi Ben Haj Abbes <me...@gmail.com>.
Hi,
I forgot to mention that I'm using the 1.5.1 version.
Regards,

On Thu, Feb 18, 2016 at 4:20 PM, Mehdi Ben Haj Abbes <me...@gmail.com>
wrote:

> Hi folks,
>
> I have DataFrame with let's say this schema :
> -dealId,
> -ptf,
> -ts
> from it I derive another dataframe (lets call it df) to which I add an
> extra column (withColumn) which is the concatenation of the 3 existing
> columns and I call it (the new column) "theone"
>
> When I print the schema for the new dataframe "theone" column has a String
> type. And when I do
> df.where(df.col("theone").equalTo("nonExistantValue")).toJavaRDD.count well
> I get the initial size of df as if the filtring did not work. but If I do
> the same query but filtring on one of the original columns I get what is
> expected as count which 0
>
> The same goes for isin
>
> Any help will be more than appreciated.
>
> Best regards,
>
>
> --
> Mehdi BEN HAJ ABBES
>
>


-- 
Mehdi BEN HAJ ABBES