You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Maheshakya Wijewardena <ma...@wso2.com> on 2015/10/08 15:13:22 UTC

Using a variable (a column name) in an IF statement in Spark SQL

Hi,

Suppose there is data frame called goods with columns "barcode" and
"items". Some of the values in the column "items" can be null.

I want to the barcode and the respective items from the table adhering the
following rules:

   - If "items" is null -> output 0
   - else -> output "items" ( the actual value in the column)

I would write a query like:

*SELECT barcode, IF(items is null, 0, items) FROM goods*

But this query fails with the error:

*unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
c0#132]; *

It seems I can only use numerical values inside this IF statement, but when
a column name is used, it fails.

Is there any workaround to do this?

Best regards.
-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 : http://wso2.com/
Email: maheshakya@wso2.com
Mobile: +94711228855

Re: Using a variable (a column name) in an IF statement in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

I'm thinking there must be a typo somewhere else as this works for me on
Spark 1.4:

Seq(("1231234", 1)).toDF("barcode", "items").registerTempTable("goods")

sql("SELECT barcode, IF(items IS NULL, 0, items) FROM goods").collect()

res1: Array[org.apache.spark.sql.Row] = Array([1231234,1])

I'll also note that you are essentially doing a coalesce here (i.e.
coalesce(items,
0))

Spark 1.5 improved error message here a bunch, you might try upgrading to
see what is wrong.

On Thu, Oct 8, 2015 at 7:28 PM, Maheshakya Wijewardena <ma...@wso2.com>
wrote:

> Spark version: 1.4.1
> The schema is "barcode STRING, items INT"
>
> On Thu, Oct 8, 2015 at 10:48 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> Hmm, that looks like it should work to me.  What version of Spark?  What
>> is the schema of goods?
>>
>> On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena <
>> maheshakya@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> Suppose there is data frame called goods with columns "barcode" and
>>> "items". Some of the values in the column "items" can be null.
>>>
>>> I want to the barcode and the respective items from the table adhering
>>> the following rules:
>>>
>>>    - If "items" is null -> output 0
>>>    - else -> output "items" ( the actual value in the column)
>>>
>>> I would write a query like:
>>>
>>> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>>>
>>> But this query fails with the error:
>>>
>>> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
>>> c0#132]; *
>>>
>>> It seems I can only use numerical values inside this IF statement, but
>>> when a column name is used, it fails.
>>>
>>> Is there any workaround to do this?
>>>
>>> Best regards.
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> Software Engineer
>>> WSO2 : http://wso2.com/
>>> Email: maheshakya@wso2.com
>>> Mobile: +94711228855
>>>
>>>
>>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: maheshakya@wso2.com
> Mobile: +94711228855
>
>
>

Re: Using a variable (a column name) in an IF statement in Spark SQL

Posted by Maheshakya Wijewardena <ma...@wso2.com>.

Spark version: 1.4.1
The schema is "barcode STRING, items INT"

On Thu, Oct 8, 2015 at 10:48 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Hmm, that looks like it should work to me.  What version of Spark?  What
> is the schema of goods?
>
> On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena <
> maheshakya@wso2.com> wrote:
>
>> Hi,
>>
>> Suppose there is data frame called goods with columns "barcode" and
>> "items". Some of the values in the column "items" can be null.
>>
>> I want to the barcode and the respective items from the table adhering
>> the following rules:
>>
>>    - If "items" is null -> output 0
>>    - else -> output "items" ( the actual value in the column)
>>
>> I would write a query like:
>>
>> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>>
>> But this query fails with the error:
>>
>> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
>> c0#132]; *
>>
>> It seems I can only use numerical values inside this IF statement, but
>> when a column name is used, it fails.
>>
>> Is there any workaround to do this?
>>
>> Best regards.
>> --
>> Pruthuvi Maheshakya Wijewardena
>> Software Engineer
>> WSO2 : http://wso2.com/
>> Email: maheshakya@wso2.com
>> Mobile: +94711228855
>>
>>
>>
>


-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 : http://wso2.com/
Email: maheshakya@wso2.com
Mobile: +94711228855

Re: Using a variable (a column name) in an IF statement in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

Hmm, that looks like it should work to me.  What version of Spark?  What is
the schema of goods?

On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena <ma...@wso2.com>
wrote:

> Hi,
>
> Suppose there is data frame called goods with columns "barcode" and
> "items". Some of the values in the column "items" can be null.
>
> I want to the barcode and the respective items from the table adhering the
> following rules:
>
>    - If "items" is null -> output 0
>    - else -> output "items" ( the actual value in the column)
>
> I would write a query like:
>
> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>
> But this query fails with the error:
>
> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
> c0#132]; *
>
> It seems I can only use numerical values inside this IF statement, but
> when a column name is used, it fails.
>
> Is there any workaround to do this?
>
> Best regards.
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: maheshakya@wso2.com
> Mobile: +94711228855
>
>
>